A Smoldering Gun From Nashville, TN
Guest post by Basil Copeland
The hits just keep on coming. About the same time that Willis Eschenbach revealed “The Smoking Gun at Darwin Zero,” The UK’s Met Office released a “subset” of the HadCRUT3 data set used to monitor global temperatures. I grabbed a copy of “the subset” and then began looking for a location near me (I live in central Arkansas) that had a long and generally complete station record that I could compare to a “homogenized” set of data for the same station from the GISTemp data set. I quickly, and more or less randomly, decided to take a closer look at the data for Nashville, TN. In the HadCRUT3 subset, this is “72730” in the folder “72.” A direct link to the homogenized GISTemp data used is here. After transforming the row data to column data (see the end of the post for a “bleg” about this), the first thing I did was plot the differences between the two series:

The GISTemp homogeneity adjustment looks a little hockey-stickish, and induces an upward trend by reducing older historical temperatures more than recent historical temperatures. This has the effect of turning what is a negative trend in the HadCRUT3 data into a positive trend in the GISTemp version:

So what would appear to be a general cooling trend over the past ~130 years at this location when using the unadjusted HadCRUT3 data, becomes a warming trend when the homogeneity adjustment is supplied.
“There is nothing to see here, move along.” I do not buy that. Whether or not the homogeneity adjustment is warranted, it has an effect that calls into question just how much the earth has in fact warmed over the past 120-150 years (the period covered, roughly, by GISTemp and HadCRUT3). There has to be a better, more “robust” way of measuring temperature trends, that is not so sensitive that it turns negative trends into positive trends (which we’ve seen it do twice how, first with Darwin Zero, and now here with Nashville). I believe there is.
Temperature Data: Pasteurized versus Homogenized
In a recent series of posts, here, here, and with Anthony here, I’ve been promoting a method of analyzing temperature data that reveals the full range of natural climate variability. Metaphorically, this strikes me as trying to make a case for “pasteurizing” the data, rather than “homogenizing” it. In homogenization, the object is to “mix things up” so that it is “the same throughout.” When milk is homogenized, this prevents the cream from rising to the top, thus preventing us from seeing the “natural variability” that is in milk. But with temperature data, I want very much to see the natural variability in the data. And I cannot see that with linear trends fitted through homogenized data. It may be a hokey analogy, but I want my data pasteurized – as clean as it can be – but not homogenized so that I cannot see the true and full range of natural climate variability.
I believe that the only way to truly do this is by analyzing, or studying, how differences in the temperature data vary over time. And they do not simply vary in a constant direction. As everybody knows, temperatures sometimes trend upwards, and at other times downward. The method of studying how differences in the temperature data allows us to see this far more clearly than simply fitting trend lines to undifferenced data. In fact, it can prevent us from reaching the wrong conclusion, as in fitting a positive trend when the real trend has been negative. To demonstrate this, here is a plot of monthly seasonal differences for the GISTemp version of the Nashville, TN data set:

Pay close attention as I describe what we’re seeing here. First, “sd” means “seasonal differences” (not “standard deviation”). That is, it is the year to year variation in each monthly observation, for example October 2009 compared to October 2008. Next, the “trend” is the result of smoothing with Hodrick-Prescott smoothing (lamnda = 14,400). The type of smoothing here is not as critical as is the decision to smooth the seasonal differences. If a reader prefers a different smoothing algorithm, have at at it. Just make sure you apply it to the seasonal differences, and that it not change the overall mean of the series. I.e., the mean of the seasonal differences, for GISTemp’s Nashville, TN data set, is -0.012647, whether smoothed or not. The smoothing simply helps us to see, a little more clearly, the regularity of warming and cooling trends over time. Now note clearly the sign of the mean seasonal difference: it is negative. Even in the GISTemp series, Nashville, TN has spent more time cooling (imagine here periods where the blue line in the chart above is below zero) than it has warming over the last ~130 years.
How can that be? Well, the method of analyzing differences is less sensitive – I.e. more “robust” — than fitting trend lines through the undifferenced data. “Step” type adjustments as we see with homogeneity adjustments only affect a single data point in the differenced series, but affect every data point (before or after it is applied) in the undifferenced series. We can see the effect of the GISTemp homogeneity adjustments here by comparing the previous figure with the following:

Here, in the HadCRUT3 series, the mean seasonal difference is more negative, -0.014863 versus -0.012647. The GISTemp adjustments increases the average seasonal difference by 0.002216, making it less negative, but not enough so that the result becomes positive. In both cases we still come to the conclusion that “on the average” monthly seasonal differences in temperatures in Nashville have been negative over the last ~130 years.
An Important Caveat
So have we actually shown that, at least for Nashville, TN, there has been no net warming over the past ~130 years? No, not necessarily. The average monthly seasonal difference has indeed been negative over the past 130 years. But it may have been becoming “less negative.” Since I have more confidence, at this point, in the integrity of the HadCRUT3 data, than the GISTemp data, I’ll discuss this solely in the context of the HadCRUT3 data. In both the “original data” and in the blue “trend” shown in the above figure, there is a slight upward trend over the past ~130 years:

Here, I’m only showing the fit relative to the smoothed (trend) data. (It is, however, exactly the same as the fit to the original, or unsmoothed, data.) Whereas the average seasonal difference for the HadCRUT3 data here was -0.014863, from the fit through the data it was only -0.007714 at the end of series (October 2009). Still cooling, but less so, and in that sense one could argue that there has been some “warming.” And overall – I.e. if a similar kind of analysis is applied to all of the stations in the HadCRUT3 data set (or “subset”) – I will not be surprised if there is not some evidence for warming. But that has never really be the issue. The issue has always been (a) how much warming, and (b) where has it come from?
I suggest that the above chart showing the fit through the smooth helps define the challenges we face in these issues. First, the light gray line depicts the range of natural climate variability on decadal time scales. This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases. If there is any anthropogenic impact here, it is in the blue line, what is in effect a trend in the trend. But even that is far from certain, for before we can conclude that, we have to rule out natural climate variability on centennial time scales. And we simply cannot do that with the instrumental temperature record, because it isn’t long enough. I hate to admit that, because it means either that we accept the depth of our ignorance here, or we look for answers in proxy data. And we’ve seen the mess that has been made of things in trying to rely on proxy data. I think we have to accept the depth of our ignorance, for now, and admit that we do not really have a clue about what might have caused the kind of upward drift we see in the blue trend line in the preceding figure. Of course, that means putting a hold on any radical socioeconomic transformations based on the notion that we know what in truth we do not know.
Some ideas for doing such things in C++ are here
http://arnholm.org/tmp/basil.htm
“Would You Like Your Temperature Data Homogenized, or Pasteurized?”
Actually I prefer my data raw and unadulterated. It’s hard to find it that way though.
JJ (07:47:45) :
Basil,
Turning to the balance of your analysis, you can save yourself considerable effort next time when calculating your ‘mean seasonal difference’ statistic. It is not necessary to create a difference series and tally all of its values to do that. Mathematically, your ‘mean seasonal difference’ statistic simplifies to the following equation:
(Te – Ts)/Y
Where:
Ts= Temperature at the start of the analysis period.
Te= Temperature at the end of the analysis period.
Y = number of years in the analysis period.
It should be much faster for you to calc that statistic next time. This does raise the following questions, however.
There are other reasons for doing what I’m doing. I’ll come back to that. But for now, I’m trying to follow you, and not having much luck.
At the beginning of the the period, the GIS temperature was 1.8. At the end of the period, it was 13.8. Now putting aside the fact that I’m actually shy a couple of months of having 128 years, according to you, the average should be (13.8 – 1.8)/128 = 0.09375. But the actual number is negative, not positive, so your shortcut is not even in the right ballpark. I think I know why you are so far off, but I’ll let you work it out. For now, besides starting at 1.8 in January 1881, and ending at 13.8 in October 2009, the average monthly seasonal difference over 1546 months was -0.012647. Have at trying to come up with the latter just from the starting point, the end point, and the number of points.
You have 130 years of data at the Nashville site, which amounts to 1,560 monthly average temperatures. First, you throw out 92% of that data by choosing only to look at October. Then you throw out 98% of that October data, by using a ‘mean seasonal difference’ statistic that is determined by the values of only two of the October monthly averages – the endpoints of the period of analysis.
Why is using only 0.13 % of the data to calculate a temperature trend superior to calculating a temperature trend from a larger portion of the data? Like, say, all of it?
I’m sure you are trying to say something profound, but I’m not getting it. I haven’t thrown out any data. I’ve used it all because I need it all to come up with the HP smooth (the blue wavy line in the charts). And contra your absurd — sorry, the more I think about this, the more in a huff I’m getting over it — notion that I’m ignoring any data, the blue lines are very good representations of how the trend varies across time.
I think you’ve misunderstood something big time, but I don’t know what it is. Did you get confused by the straight lines in the second figure? Those are linear trends, using good old ordinary least squares, through all the data — actual temperatures, not seasonal differences.
I know I’m doing some unusual things with the way I’m analyzing the data and presenting the results, and so I bear the burden of trying to make sure I’ve explained it thoroughly. But readers have a burden, too, of making sure they understand what I’m doing before they go off and say I’m half-cocked.
Looking back at what you’ve said, I’m trying hard to understand you, because you might just be right. But I cannot understand you. Maybe we could take a step back and try to understand each other better if you will clarify what you are trying to say with these statements:
First, you throw out 92% of that data by choosing only to look at October.
What in the world are you saying? If you want to talk about particular months, there are — just to pick one — 128 seasonal differences for April in the -0.012647 average. And 128 seasonal differences for June. And so on.
Then you throw out 98% of that October data, by using a ‘mean seasonal difference’ statistic that is determined by the values of only two of the October monthly averages – the endpoints of the period of analysis.
Again, what are you saying? What “two of the October monthly averages?” In truth, there are 128 Octobers in the -0.012647 average. But only 127 Novembers, or Decembers. E.g., if you multiply 128×10, and 126×2, and add them, you’ll get 1534 observations, as in the following printout of stats for this variable:
Summary Statistics, using the observations 1881:01 - 2009:10
for the variable 'sd_GIS' (1534 valid observations)
Mean -0.012647
Median 0.00000
Minimum -10.900
Maximum 9.5000
Standard deviation 2.6586
Well, enough of this. To the bit of snark “Like, say, all of it?” all I can say is, “I did you all of it.”
Basil
Re: Basil (05:26:33), ScottA (02:14:06), & Others
Converting 12 month rows for each year to column-stacks is a snap in Excel using the “offset” function.
Pamela,
Up here in the Great White North ice on the inside of a windshield is a frequent winter hazard. The only recourse that I am familiar with is to use an old credit card to scrape the glass, as it will bend to conform to the shape of the window. Points cards and other useless customer loyalty cards work as well. I use an old Tim Horton’s card. But since I am dealing with a minivan, I often find it hard to reach the furthest recesses, and often fantasize about someone inventing an inside scraper. It’s been frigid here, too; in fact, record-breaking out west.
Re: bill (08:44:20) & bill (07:40:13)
Good heavens bill. Just enter the “offset” function in one cell, copy/paste, & be done.
Having become more than a little disenchanted with the low level of statistical inquiry (and data handling) undertaken by GISS and CRU, I can see no other choice other than for individuals to enter into the fray and provide disinterested, non-partisan analysis and make this analysis available to review by their peers.
To this end, I am setting up a Bayesian inference engine (using MCMC) which will decompose monthly or daily temperature records for a single (or multiple locations) into the following:
1) A within year annual cycle – using a Bayesian form of Fourier decomposition taking, say, the first 12 terms – (i.e. 1 year cycle, 1/2 year cycle, 1/3 year cycle, and so on). [Purpose is to systematically account for the strong seasonal signal].
2) A decadal cyclical component – (i.e. time span of data series, time span of data series/2, and so on). [Purpose is to identify influences such as PDO and other long cyclical behaviour – and to place error bounds on that signal].
3) A random walk component – possibly fitted as an AR(n) process.
4) A linear trend (with offset) and error bounds. This is the long-run ‘climatic’ trend component and is, ultimately, the primary statistical object of interest.
5) Measurement noise. All measurement processes are noisy. Some of the noise may be attributed to human-induced variations; other parts may be attributed to short-term weather variations.
Does anyone have any thoughts as to the utility of this endeavour?
I was interested in finding some historical weather data for New York City and since I’ve been living here for over fifty years, I’m familiar with details of the locations around where the data is collected.
I was looking for the station data for Central Park in Manhattan. The station is in the middle of the park and the park hasn’t changed radically since it was first built in the 19th century. The park also provides some insulation from the streets surrounding the park. The surrounding streets were first developed in the early 20th century and also haven’t changed radically over time either.
The reason is that this is where the temperature is taken that’s in every newspaper and TV weather report and has been for decades. Since this is the benchmark for weather reporting in New York City, there should be no reason at all for “adjustments”, “corrections”, “homogenization”, etc. Imagine telling people that the temperature they were told on the local TV and newspapers was all wrong for the past hundred years.
What I was looking for was a single station, in the same location, in a place that hasn’t changed very much. The current location, Belvedere Castle, has been in use since 1961. Before that, it was taken at a building called The Arsenal almost at the southeast corner of the park. My idea was to take look and see if there was any trend in that one set of data and compare it to GISS which shows a definite uptrend. My problem is that NSDC wants $70 for the data from 1961 and $200 for the complete record back to 1869. Also I’m not sure if the historical data is “as reported” or has been “corrected”. I have to think about that.
So I thought I found a way around this by getting from the Met Office. And sure enough they do have a set but it’s marked “New York/La Guardia” (725030.txt) and goes back to 1822. Not as good though because it’s located in Queens which was pretty rural up until the 1920’s and 1930’s. Also, it’s right on the water so that will have an effect on the readings. And temperatures in Queens are cooler generally than in Manhattan (smaller buildings, less concrete, more trees).
But wait! LaGuardia Airport (correct spelling) station didn’t start until 1935. What’s up with that? Hell, the airplane wasn’t invented until 1903. This raises some interesting questions.
Where does the rest of the data going back to 1822 come from?
Were a number of area stations averaged together? Which ones?
Why, wouldn’t you just use perfectly good single set continuous back to 1869? Perhaps because it provides some opportunities for “adjusting”? Hmmm?
Nick Stokes (5:5:39), and Basil (7:31:56), this needs more analysis.
http://www.gilestro.tk/2009/lots-of-smoke-hardly-any-gun-do-climatologists-
falsify-data/
He is a Neurobiologist and very confident that the adjustment bias is zero. I was about to put in my two bits worth when I scrolled down to a comment that echoed my thoughts, by SG. SG said that if the early adjustment is down and later is up then a warming slope is produced from a neutral
The point is, are the adjustments random on a time basis ?. My reading of comments here (Spain,Darwin) is that they are not. Early years are adjusted down and later years are up.
If you look at this on a swing (around zero) basis, plus one and the minus one cancel out but the time series result is a positive warming trend.
Richard Wakefield:
“I have long speculated that what we are seeing is not a true increase in temps, but a narrowing of variation below the maximum temps, which will tend to increase the average daily temps, but no real physical increase is occurring. That is, what we are seeing are shorter warmer winters, with no change in summer temps, which gives an increase in average temps over time.
Also, I’m speculating that spring can come earlier and fall come later, which one can see with the temp changes in those transition months.
This could be the killer of AGW if this is the case because there is no real increase in temps, just less variation below max temps for each month. The alarmism of a catastrophic future disappears then doesn’t it.”
Amazing that mainstream science hasn’t looked into this possibility as a way of testing its hypothesis. (Actually, it’s not amazing.)
Basil,
“Let me see if I understand. The GISS adjustment is to make Nashville “rural.” And it does this by making it appear that Nashville has warmed more than it actually has?”
Has the GISS homogenization adjustment made Nashville appear warmer? The only way you would know that would be to compare to the non-homogenized data, and you have not done that. See my post above on this topic.
“Shouldn’t the adjustment be doing just the opposite?”
Perhaps it does. You wont know until you compare to the non-homogenized data.
“If the “unadjusted” Nashville trend was already sloping downward,”
Was it? Have you looked at the non-homogenized data yet?
Mike Fox (23:35:01) : See what I just wrote to JJ about “raw” data. I certainly am not interested in going back to the truly “raw” data, which is daily.
As near as I’ve been able to work it out, the GHCN and USHCN (v2 or 1) are all “cooked” in various ways. That just leaves the online dailies….
If there is some ‘in between’ compilation that stays “uncooked” I’ve not yet found it. (If you know of one, please post a note at my blog. Given the pace of things now and here I can’t keep up with all the threads with followups…)
In deciding where to begin, we have to decide whether we begin with the truly raw data, or the data that was received by CRU (or GISS) from the met agencies.
GISS does not get raw or semi-raw data. They get GHCN from NOAA / NCDC and USHCN or USHCN.v2 from NOAA / NCDC. The GHCN data are horridly “cooked” via thermometer deletion. ( I don’t have different ‘eras’ or ‘release levels’ to look at to see if there is a cooking / bias for individual stations over time… but it’s a big “Dig Here”…) USHCN is significantly broken given that the comparison of 2 different “versions” of what is supposed to be the same “raw data” can have 1/2 C variations in any one yearly average and has about 17 years completely missing from one… (IIRC it has 1883 in one and starts at 1900 in the other for Orland – yet the “adjusted data” DOES start at the earlier time. How you can have the “adjusted” start before you have the “raw” start is an “interesting question” …
IMHO, it is this pernicious data cooking by NOAA / NCDC that is the most egregious as it impacts EVERYONE else. Even the Japanese were snookered in that they use GHCN too.
Much of the ruckus over FOI’s has been simply to get the latter. And I think we now have some of that with the “subset” of HadCRUT3 that has been released. I may be mistaken — somebody correct me if I am — but I think this is supposed to be the data “as received” from the met agencies. They describe the data they are releasing thusly:
I don’t think so. Maybe they released something else too? But what I saw on the top page sounds “kind of raw” but when you look into it, the web site says the 1500 records are of the CRUcooked product.
From:
http://chiefio.wordpress.com/2009/12/10/met-office-uea-cru-data-release-polite-deception/
QUOTE:
“The data subset consists of a network of individual land stations that has been designated by the World Meteorological Organization for use in climate monitoring. The data show monthly average temperature values for over 1,500 land stations.”
“The data” “individual land stations” “monthly average temperature values”. It all sounds like they are releasing the temperature data…
END QUOTE
and then further down…
QUOTE:
There is a link near the top of that page that mentions this is a subset of the HadCRUT3 data set… “But I thought HadCRUT3 was a product, not the “raw” data?”
[…]
From:
http://www.metoffice.gov.uk/climatechange/science/monitoring/hadcrut3.html
[…]
“HadCRUT3 is a globally gridded product of near-surface temperatures, consisting of annual differences from 1961-90 normals. It covers the period 1850 to present and is updated monthly.
The data set is based on regular measurements of air temperature at a global network of long-term land stations and on sea-surface temperatures measured from ships and buoys. Global near-surface temperatures may also be reported as the differences from the average values at the beginning of the 20th century.”
So this is the product and not the data. It has the HadCRUt 1850 cutoff in it. It is based on measurements and it not itself a measurement of anything. This is not the temperature data, this is the homogenized pasteurized processed data food product.
END QUOTE.
I would love to be told that there was another 1500 data set released and it was the “raw” data, but as near as I can tell, this looks like CRU Crud to me.
nominal (09:58:14)
Thank You for the links. Would this information be useful for a complete global surface temperature reconstruction (being at the sea surface, not in the water) since land stations cover such a small percentage of the globe? Or has this been attempted. It’s funny how everybody ends up just using surface stations. Also there is an enormous amount of data on paper lying around never reported to NOAA, although this would require more effort than the CRU seems to have.
Keith Minto (18:57:36) :
An excellent point. I also find it a little odd that the adjustments would turn out around 0. I would expect them to be a little higher or a little lower if the adjustments were necessary because of some equipment. If the net value of the adjustment should be around 0, then why bother adjusting any of it? It seems more likely that you would want to look at each station on a station by station basis and determine what adjustments were necessary by looking at a station. If that figure came out to be as close to 0 as that distribution indicates, I would be pretty shocked.
This would mean that the flaws in all of the stations developed in such a way that would add up to 0!!! Some stations read too hot, some too cold, which would be pretty incredible, no? It would also mean it is safe to stop adjusting the data until a new station comes online. Does GHCN/CRU look the same with the adjustments removed? That would prove the assertion made by Gilestro is correct or incorrect.
ScottA (02:14:06) :
Did I miss the bleg for transforming row to column data?
I have not caught up yet if you have the rotation answer, but if you are in Excel spreadsheet, simly highlight the data by dragging, then copy, select a starting cell in a blank area, then “Paste special – transpose” (box at lower right of window).
Oops, read too fast.
Basil,
“There are other reasons for doing what I’m doing. I’ll come back to that. But for now, I’m trying to follow you, and not having much luck.”
I believe I have figured out our disconnect.
I had understood that you were only using the October data for your ‘mean seasonal difference’, using it as a representative annual figure (much like we often use the annual water year minimum for certain trend analyses, which coincidently also uses October values). Under that assumption, my previous post was correct.
I gather from your subsequent comments that you are actually using the data from all months of the year. If that is the case, I amend my previous post as follows (changes in italics):
***
Turning to the balance of your analysis, you can save yourself considerable effort next time when calculating your ‘mean seasonal difference’ statistic. It is not necessary to create a difference series and tally all of its values to do that. Mathematically, your ‘mean seasonal difference’ statistic simplifies to the following equation:
(Te – Ts)/(Y-1)
Where:
Ts= Temperature at the start of the analysis period.
Te= Temperature at the end of the analysis period.
Y = number of years in the analysis period.
Simply apply this equation for each month you wish to include in your ‘mean seasonal difference’, and average the results.
***
The balance of my previous comments on your method apply, with the minor correction that your method only uses 24 out of the 1,548 temperature measurements available to you to define your ‘mean seasonal difference/Trend’. You are actually using 1.5% of the data to derive your trend, not 0.13% as I had claimed earlier.
The previous questions remain:
Why do you consider a trend that only reflects 1.5% of the data to be superior to a trend calculated from a larger percentage of the data, such as the standard trend line that uses all of the data?
Given that your ‘mean seasonal difference’ statistic only uses 24 datapoints (the monthly endpoints of your seasonal series) it should be apparent that the choice of those 24 points is … pretty important. Just moving one of the endpoints of your analysis period forward or backward by one year could dramatically change the ‘mean seasonal difference’ trend that you calaculate.
In fact, on a dataset with essentially zero trend (such as the homogenized GISS dataset for Nashville that shows much less than 0.1C warming over a century) you could completely flop the trend from warming to cooling and back with only tiny moves of the endpoints.
To quantify, I have checked these assertions against the Nashville data.
Applying your method to these data, I arrive at a ‘mean seasonal difference/trend’ (MSD) for 1881-2009 of -0.0126. This matches what you report here.
Applying my simplified method to the same data, I arrive at an MSD for 1881-2009 of -0.0126. The simplified method works.
Moving the start of the analysis period up only five years, the MSD flops sign from cooling to warming. MSD for 1885-2009 = 0.007.
Move up the start of the analysis period up one additional year and quit two years sooner, and you get an MSD for 1886-2007 of 0.025. By shifting one endpoint of the analysis by six years and the other 2 years, your method turns a cooling trend into a warming trend double the size.
Is this robust?
Incidently, I calc’d a standard linear regression trend for the GISS homogenized data 1881-2009. I did it a couple of different ways, and end up with a cooling trend each time. What method did you use to calculate trend that shows a warming in the GISS data over that period?
Re: Keith G (17:49:20)
I have some background in that area. Great memories: Metropolis-Hastings algorithm.
I would very strongly caution you that almost any assumptions of randomness will be suspect, aside from your #5. Many will disagree with me and I will, of course, disagree with them (respectfully if collegiality finds a way to be a 2-way street).
The randomness issue isn’t nearly the problem it seems to be if conclusions are qualified &/or presented with necessary context. My first question is always: What assumptions are the conclusions based upon? Too many stats profs fail to impress upon their students the importance of critical thinking about assumptions, but it is (perhaps) easy to understand why, given how much algebra has to be plowed through.
I will be interested to see what insights you can share. Every perspective sheds new light and I’ve not heard much talk of Bayesian stuff around here.
Cheers.
Re: Paul Vaughan (00:29:52) :
Thanks for the note of caution wrt randomness. Correct me if I am wrong, but this caution refers to assumption #3?
Input wrt model assumptions always appreciated.
In any event, it will take me a few days to set up and debug the statistical model: I have to work through the algebra, write the Mathematica code, and debug – all in my spare time. But with Christmas looming, results may not be forthcoming quickly.
I will start with a couple of neighbouring sites – a la Peter (and Dad) – to begin with. If there is any merit in continuing, I will expand to a larger data set.
Insights, if any, will be shared – as will data, assumptions and code.
Jeff (01:31:25) :
I’m curious if the GISTemp data has both raw and homoginized data ?
Because if that is the case I have seen several stepladder adjustments from raw to adjusted in Pa alone …
adjustments that can’t be justified by station moves or UHI …
A rather complex question to answer that OUGHT to be simple. But, IMHO, the complexity shows where the “issue” starts…
GIStemp takes in what is called “raw” data. Everyone studiously uses that label. It uses “raw” GHCN and USHCN. But when you go to NOAA / NCDC you find that the “raw” GHCN and USHCN are not raw. There is some ill defined “QA” and “homogenization” applied. AND the GHCN data are heavily biased by deleting cold thermometers from the recent past, but leaving them in the “baseline periods” of the major “value added” (GAK!) temperature series (GIStemp, HadCRUT). So “raw” isn’t “raw” and “unadjusted” is “adjusted”… There is your first clue…
When you find you must keep two sets of books on what “is is”, well, something is smelling. And when someone says “You just don’t have the credentials” something is smelling. And when someone says “trust us”, don’t.
So, back to your question:
The really raw USA data go to NOAA / NCDC. It goes through their sausage grinder and comes out as part of the GHCN data set (in degrees C) or the whole USHCN data set in degrees F. Both of those are available in two forms. Adjusted and “Unadjusted”. Yet the “unadjusted” are in fact adjusted. And a short comparison of the USHCN old version (that ends in 2007) and the USHCN Version 2 shows that even those two versions of “unadjusted data” can vary by at least 1/2 F for any given annual mean. And both are different from the GHCN copy (after converting to the same C or F for comparison).
THEN GIStemp takes this “unadjusted adjusted data” and adjusts it one heck of a lot more with a double dip of homogenizing on top of the homogenizing done to the “unadjusted” cooked data that is taken in (that GISS calls “raw” but isn’t.)
Baffling? I think that was the whole point…
I like to “Keep a tidy mind”. And when something is as untidy to try and think about (and keep straight) as this is, my “Bull In Our TImes!” raspberry alarm clock goes off…
So, back to PA, to sort out who did the buggering and stepping you saw, you must take the GIStemp data and compare it with the USHCN or USHCN.v2 data (depending on if you are looking at a recent chart or one from about a month ago). Then, if you find it is USHCN (either version) you would need to go “upstream” to the dailies…
Good Luck..
BTW, you said ” several stepladder adjustments from raw to adjusted in Pa alone”. I would assert you are most likely NOT looking at really raw data. You are most likely looking at the USHCN unadjusted data … and those are not raw even though GIStemp calls them raw… If it’s USHCN vs GIStemp end product, then you are looking at the results of GIStemp STEP1 – homogenization and GIStemp STEP2 – UHI adjustment. Of them, I’d suspect the STEP2 program PApars.f as the most likely suspect.
(Notice what I mean about the pain of trying to keep this mess tidy in a tidy mind… you need to use phrases like “unadjusted adjusted” and “raw adjusted” and … It is just screaming deception from the tortured language needed to track the pea under the shells…)
H.R. (04:57:13) :
JustPassing (02:38:55) :
“I think I’ll send the CRU at East Anglia a nice box of fudge for Xmas.”
Bingo! If your lead was followed… a couple of thousand boxes of fudge arriving at CRU would certainly make a point.
IFF you do that, please assure the fudge is overcooked and arrives stale… just like the temperature series… and having a few ‘rubber erasers’ as ‘synthetic in-fill’ would be a nice touch too…
Come to think of it “salting a mine” is not that much different from what they were doing… so I’d make sure to put a lot of salt in, but not so much sugar…
Just a thought… (wouldn’t want them to actually enjoy the fudge, would we 😉
Hi Guys,
First how I converted the CRU downloads to XLS
Moved all the files into one directory as .txt files
You can download this as a zip file
http://www.akk.me.uk/CRU_data.zip 3.3Mb
Rename all the files to .xls
You then get the data in one column
Uses data –text to columns, delimited –space to split into columns.
Would appreciate your comments on this anomaly.
I took the Met Office Station data for Oxford.
They give tmax and tmin.
For each year added the 12 month values and divided by 12 to get the average for the year.
Calculated (tmax-tmin)/2+tmin to get the average temperature for each year.
Took the CRU data for Oxford which only covers 1900-1980.
[ I wonder why when the station data is readily available on the Met Office site ]
Again, added each month together and divided by 12 to get the year’s average
Compared the CRU average with the Met Office average.
Minor differences until the last 3 years when the difference jumps from maximums of around 0.05 to 0.5
You can download the spread sheet at
http://www.akk.me.uk/tempdata.zip 71Kb
Let me know if you can spot any errors.
Thanks
JJ,
Well, I still do not see how a careful reading, in the first place, would have led to your mistake: the reference to October is preceded with “for example,” and it is sitting three lines below a chart that shows that I used all months. In any case,
Simply apply this equation for each month you wish to include in your ‘mean seasonal difference’, and average the results.
Yes, this would work.
But I am not just interested in the average. I want to see the patterns in how the average changes over time. These patterns are indicative of “natural climate variation” and need to be quantified and understood before we can began attributing the sources of climate “change.” As I said in commenting on the last figure, which is the same as the blue line in the preceding figure:
I suggest that the above chart showing the fit through the smooth helps define the challenges we face in these issues. First, the light gray line depicts the range of natural climate variability on decadal time scales. This much – and it is very much of the data – is completely natural, and cannot be attributed to any kind of anthropogenic influence, whether UHI, land use/land cover changes, or, heaven forbid, greenhouse gases.
This very important point is lost in fitting simple linear trends through undifferenced temperature data. When doing the latter, and there is a tendency to attribute any rising trend to AGW. But that is a spurious conclusion, because the trend depends on where the start and the end are in terms of cycles in natural variation. In other words, while there might well be some AGW in a trend of rising temperatures, if you peg the trend calculation to start during a cold period, and to end during a warm period, then the trend will capture a spurious increase due to natural climate variation. In fact, this is exactly what the IPCC did, in Chapter 3 of AR4, by splitting the 20 century at 1950 to argue that warming in the second half of the century was so much greater than in the first half that it must be due to AGW. As for “robust,” and
Moving the start of the analysis period up only five years, the MSD flops sign from cooling to warming. MSD for 1885-2009 = 0.007.
I think you need to recheck your calculations:
———————————————————
Summary Statistics, using the observations 1885:01 – 2009:10
for the variable ‘sd_GIS’ (1498 valid observations)
Mean -0.0025367
Median 0.00000
Summary Statistics, using the observations 1885:01 – 2009:10
for the variable ‘sd_CRU’ (1498 valid observations)
Mean -0.0064085
Median 0.00000
——————————————————
In any case, the variability of the monthly seasonal difference is so high that moving around the beginning and ending points is not going to make a “statistically significant” difference, no matter how hard you try. And that is an important point — that the volatility is that great.
Incidentally, in your “shortcut” approach, you are the one not using all the data, and as a result of that, you cannot provide a true estimate of volatility (standard deviation) for the data. I can.
Incidently, I calc’d a standard linear regression trend for the GISS homogenized data 1881-2009. I did it a couple of different ways, and end up with a cooling trend each time. What method did you use to calculate trend that shows a warming in the GISS data over that period?
Here’s what I have:
———————————-
OLS estimates using the 1534 observations 1882:01-2009:10
Dependent variable: sd_GIS
HAC standard errors, bandwidth 8 (Bartlett kernel)
coefficient std. error t-ratio p-value
———————————————————
const -0.0378095 0.162683 -0.2324 0.8162
time 3.22807E-05 0.000172980 0.1866 0.8520
Mean of dependent variable = -0.0126467
Standard deviation of dep. var. = 2.65862
Sum of squared residuals = 10835.3
Standard error of the regression = 2.65945
Unadjusted R-squared = 0.00003
Adjusted R-squared = -0.00062
Degrees of freedom = 1532
Durbin-Watson statistic = 1.7165
First-order autocorrelation coeff. = 0.140591
Log-likelihood = -3676.08
Akaike information criterion (AIC) = 7356.17
Schwarz Bayesian criterion (BIC) = 7366.84
Hannan-Quinn criterion (HQC) = 7360.14
——————————————
What did you do about the missing value in the GIS data set?
thanks all for your great comments and insights… have you seen this? http://www.dailymail.co.uk/news/article-1235395/SPECIAL-INVESTIGATION-Climate-change-emails-row-deepens–Russians-admit-DID-send-them.html
Seth (22:06:12)
yeah, I’d say it’s not only useful, but a requirement that this data be used. And you’re right, it is only the sea -surface- temperature (SST). i think the CRU used the HADSST2 dataset for the IPCCar4, which is composed of the:
“…International Comprehensive Ocean-Atmosphere Data Set, ICOADS, from 1850 to 1997 and from the NCEP-GTS from 1998 to present.”
HadSST2 is produced by taking in-situ measurements of SST from ships and buoys…”
http://badc.nerc.ac.uk/data/hadsst2/
list of available marine datasets here: http://www.marineclimatology.net/wiki/index.php?title=Datasets
IMO, the more data used, the more accurate the models… of course, there is the homogenization, “quality control” and gridding manipulation issues that effect accuracy…