On 'denying' Hockey Sticks, USHCN data, and all that – part 1

Part 2 is now online here.

One of the things I am often accused of is “denying” the Mann hockey stick. And, by extension, the Romm Hockey stick that Mann seems to embrace with equal fervor.

While I don’t “deny” these things exist, I do dispute their validity as presented, and I’m not alone in that thinking. As many of you know Steve McIntyre and Ross McKitrick, plus many others have extensively debunked statistics that went into the Mann hockey stick showing where errors were made, or in some cases known and simply ignored because it helped “the cause”.

The problem with hockey stick style graphs is that they are visually compelling, eliciting reactions like whoa, there’s something going on there! Yet, oftentimes when you look at the methodology behind the compelling visual you’ll find things like “Mike’s Nature Trick“. The devil is always in the details, and you often have to dig very deep to find that devil.

Just a little over a month ago, this blog commented on the hockey stick shape in the USHCN data set which you can see here:

2014_USHCN_raw-vs-adjusted

The graph above was generated by” Stephen Goddard” on his blog and it generated quite a bit of excitement and attention.

At first glance it looks like something really dramatic happened to the data, but again when you look at those devilish details you find that the visual is simply an artifact of methodology. Different methods clearly give different results and the”hockey stick” disappears when other methods are used.

USHCN-Adjustments-by-Method-Year

The graph above is courtesy of Zeke Hausfather Who co-wrote that blog entry with me. I should note that Zeke and I are sometimes polar opposites when it comes to the surface temperature record. However, in this case we found a point of agreement. That point was that the methodology gave a false hockey stick.

I wrote then:

While Goddard’s code and plot produced a mathematically correct result, the procedure he chose (#1 The All Absolute Approach) comparing absolute raw USHCN data and absolute finalized USHCN data, was not, and it allowed non-climatic differences between the two datasets, likely caused by missing data (late reports) to create the spike artifact in the first four months of 2014 and somewhat overstated the difference between adjusted and raw temperatures by using absolute temperatures rather than anomalies.

Interestingly, “Goddard” replied and comments with a thank you for helping to find the reason for this hockey stick shaped artifact. He wrote:

stevengoddard says:

http://wattsupwiththat.com/2014/05/10/spiking-temperatures-in-the-ushcn-an-artifact-of-late-data-reporting/#comment-1632952  May 10, 2014 at 7:59 am

Anthony,

Thanks for the explanation of what caused the spike.

The simplest approach of averaging all final minus all raw per year which I took shows the average adjustment per station year. More likely the adjustments should go the other direction due to UHI, which has been measured by the NWS as 8F in Phoenix and 4F in NYC.

Lesson learned. It seemed to me that was the end of the issue. Boy, was I wrong.

A couple of weeks later in e-mail Steven Goddard circulated a new graph with a hockey stick shape which you can see below. He wrote to me and a few others on the mailing list this message:

Here is something interesting. Almost half of USHCN data is now completely fake.

Goddard_screenhunter_236-jun-01-15-54

http://stevengoddard.wordpress.com/2014/06/01/more-than-40-of-ushcn-station-data-is-fabricated/

After reading his blog post I realized he had made a critical error and I wrote back an e-mail the following:

This claim: “More than 40% of USHCN final station data is now generated from stations which have no thermometer data.”

Is utterly bogus.

This kind of unsubstantiated claim is why some skeptics get called conspiracy theorists. If you can’t back it up to show that 40% of the USHCN has stopped reporting, then don’t publish it.

What I was objecting to was the claim if 40% of the USHCN network was missing – something I know from my own studies to be a false claim.

He replied back with a new graph and the strawman argument and a new number:

The data is correct.

Since 1990, USHCN has lost about 30% of their stations, but they still report data for all of them. This graph is a count of valid monthly readings in their final and raw data sets.

Goddard_screenhunter_237-jun-01-16-10

The problem  was, I was not disputing the data, I was disputing the claim that 40% of USHCN stations were missing and had “completely fake” data (his words).  I knew that to be wrong. So I replied with a suggestion.

On Sun, Jun 1, 2014 at 5:13 PM, Anthony  wrote:

I have to leave for the rest of the day, but again I suggest you take this post down, or and the very least remove the title word “fabricated” and replace it with “loss” or something similar.
Not knowing what your method is exactly, I don’t know how you arrived at this, but I can tell you that what you plotted and the word “fabricated” don’t go together they way you envision.
Again, we’ve been working on USHCN for years, we would have noticed if that many stations were missing.
Anthony

Later when I returned, I noted a change had been made to Goddard’s blog post. The word “fabrication” remained but made a small change with no mention of it to the claim about stations. Since I had open a new browser window I had the before and after that change which you can see below:

http://wattsupwiththat.files.wordpress.com/2014/06/goddard_before.png

http://wattsupwiththat.files.wordpress.com/2014/06/goddard_after.png

I thought it was rather disingenuous to make that change without noting it, but I started to dig a little deeper and realized that Goddard was doing the same thing he was before when we pointed out the false hockey stick artifact in the USHCN; he was performing a subtraction on raw versus the final data.

I then knew for certain that his methodology wouldn’t hold up under scrutiny, but beyond doing some more private e-mail discussion trying to dissuade him from continuing down that path, I made no blog post or other writings about it.

Four days later, over at Lucias blog “The Blackboard” Zeke Hausfather took note of the issue and wrote this post about it: How not to calculate temperature

Zeke writes:

The blogger Steven Goddard has been on a tear recently, castigating NCDC for making up “97% of warming since 1990″ by infilling missing data with “fake data”. The reality is much more mundane, and the dramatic findings are nothing other than an artifact of Goddard’s flawed methodology.

Goddard made two major errors in his analysis, which produced results showing a large bias due to infilling that doesn’t really exist. First, he is simply averaging absolute temperatures rather than using anomalies. Absolute temperatures work fine if and only if the composition of the station network remains unchanged over time. If the composition does change, you will often find that stations dropping out will result in climatological biases in the network due to differences in elevation and average temperatures that don’t necessarily reflect any real information on month-to-month or year-to-year variability. Lucia covered this well a few years back with a toy model, so I’d suggest people who are still confused about the subject to consult her spherical cow.

His second error is to not use any form of spatial weighting (e.g. gridding) when combining station records. While the USHCN network is fairly well distributed across the U.S., its not perfectly so, and some areas of the country have considerably more stations than others. Not gridding also can exacerbate the effect of station drop-out when the stations that drop out are not randomly distributed.

The way that NCDC, GISS, Hadley, myself, Nick Stokes, Chad, Tamino, Jeff Id/Roman M, and even Anthony Watts (in Fall et al) all calculate temperatures is by taking station data, translating it into anomalies by subtracting the long-term average for each month from each station (e.g. the 1961-1990 mean), assigning each station to a grid cell, averaging the anomalies of all stations in each gridcell for each month, and averaging all gridcells each month weighted by their respective land area. The details differ a bit between each group/person, but they produce largely the same results.

Now again, I’d like to point out that Zeke and I are often polar opposites when it comes to the surface temperature record but I had to agree with him on this point; the methodology created the artifact. In order to properly produce a national temperature gridding must be employed, using the raw data without gridding will create various artifacts.

Spatial interpolation (gridding) for a national average temperature would be required in a constantly changing dataset, such as GHCN/USHCN, no doubt, gridding is a must. For a guaranteed quality dataset, where stations will be kept in the same exposure, producing reliable data, such as the US Climate Reference Network (USCRN), you could in fact use the raw data as a national average and plot it. Since it is free of the issues that gridding solves, it would be meaningful as long as the stations all report, don’t move, aren’t encroached upon, and don’t change sensors- i.e. the design and production goals of USCRN.

Anomalies aren’t necessarily required, they are an option depending on what you want to present. For example NCDC gives an absolute value for the national average temperature in their State of the Climate report each month, they also give a baseline and the departure anomaly from that baseline for both CONUS and Global temperature.

Now let me qualify that by saying that I have known for a long time that NCDC uses in filling of data from surrounding stations as part of the process of producing a national temperature average. I don’t necessarily agree with their methodology as being perfect, but it is a well-known issue and what Goddard discovered was simply a back door way of pointing out that the method exists. It wasn’t news to me and to many others who have followed the issue.

This is why you haven’t seen other prominent people in the climate debate ( Spencer, Curry, McIntyre, Michaels, McKitrick) and even myself make a big deal out of this hockey stick of data difference that Goddard has been pushing. If this were really an important finding you can bet they and yours truly would be talking about it and providing support and analysis.

It’s also important to note that Goddards graph  does not represent a complete loss of data from these stations. The differencing method that Goddard is using detects every missing data point from every station in the network. This could be as simple as one day of data missing in an entire month, or a string of days, or even an entire month which is rare. Almost every station in the USHCN at one time or another is missing some data. One exception might be the station at Mohonk Lake, New York which has a perfect record due to a dedicated observer, but has other problems related to siting.

If we were to throw out an entire month’s worth of observations because one day out of 31 is missing, chances are we’d have no national temperature average at all. So the method was created to fill in missing data from surrounding stations. In theory and in a perfect world this would be a good method, but as we know the world is a messy place, and so the method introduces some additional uncertainty.

The National Cooperative Observer network a.k.a. co-op is a mishmash of widely different stations and equipment. the co-op network is a much larger set of stations than the USHCN. The USHCN is a subset of the larger co-op network comprising some 8000 stations around the United States. Some are stations in Observer’s backyards, or at their farms, some are at government entities like fire stations and Ranger stations, some are electronic ASOS systems at airports. The vast majority of stations are poorly sited as we have documented using the surface station project, by our count 80% of the USHCN as poorly sited stations.  The real problem is with the micro-site issues of the stations. this is something that is not effectively dealt with in any methodology used by NCDC. We’ll have more on that later but I wanted to point out that no matter which data set you look at (NCDC, GISS, HadCRUT, BEST) the problem of station siting bias remains and is not dealt with. for those who don’t know NCDC provides the source data for the other interpretations of the surface temperature record, so they all have it. More on that later, perhaps in another blog post.

When it was first created the co-op network was done entirely on paper forms called B – 91’s. the observer would write down the daily high and low temperatures along with precipitation for each day of the month and then at the end of the month mail it in. An example B-91 form from Mohonk Lake, NY is shown below:

mohonk_lake_b91_image

Not all forms are so well maintained. Some B-91 forms have missing data, which can be due to the observer missing work, having an illness, or simply being lazy:

Marysville_B91

The form above is missing weekends because the secretary at the fire station doesn’t work on weekends and the firefighters aren’t required to fill in for her. I know this having visited this station and I interviewed the people involved.

So, in such an imperfect “you get what you pay for” world of volunteer observers, you know from the get-go that you are going to have missing data, and so, in order to be able to use any of these at all, a method had to be employed to deal with it, and that was infilling of data. This has been a process done for years, long before Goddard “discovered” it.

There was no nefarious intent here, NOAA/NCDC isn’t purposely trying to “fabricate” data as Goddard claims, they are simply trying to be able to figure out a way to make use of it at all.  The word “fabrication” is the wrong word to use, as it implies the data is being plucked out of thin air. It isn’t – it is being gathered from nearby stations and used to create a reasonable estimate. Over short ranges one can reasonably expect daily weather (temperature at least, precip not so much) to be similar assuming the stations are similarly sited and equipped but that’s where another devil in the details exists.

Back when I started the surfacestations project, I noted one long-period well sited station, Orland was in a small sea of bad stations, and that its temperature diverged markedly from its neighbors, like the horrid Marysville Fire station where the MMTS thermometer was directly next to asphalt:

marysville_badsiting[1]

Orland is one of those stations that reports on paper at the end of the month. Marysville (shown above) reported daily using the touch-tone weathercoder, so its data was available by the end of each day.

What happens in the first runs of the NCDC CONUS temperature process is that they end up with mostly the airports ASOS stations and the weathercoder stations. The weathercoder reporting stations tend to be more urban than rural since a lot of observers don’t want to make long distance phone calls. And so in the case of missing station data on early in the month runs, we tend to get a collection of the poorer sited stations. The FILNET process, designed to “fix” missing data goes to work, and starts infilling data.

A lot of the “good” stations don’t get included in the early runs, because the rural observers often opt for a paper form mailed in rather than the touch-tone weathercoder, and those stations have data infilled from many of the nearby ones, “polluting” the data.

And we have shown back in 2012, those stations have a much lower century scale trend than than the majority of stations in the surface network. In fact, by NOAA’s own siting standards, over 80% of the surface network is producing unacceptable data and that data gets blended in.

Steve McIntyre noted that even in good stations like Orland, the data gets “polluted” by the process:

http://climateaudit.org/2009/06/29/orland-ca-and-the-new-adjustments/

So, imagine this going on for hundreds of stations, perhaps even thousands early on in the month.

To the uninitiated observer, this “revelation” by Goddard could look like NCDC is in fact “fabricating” data. Given the sorts of scandals that have happened recently with government data such as the IRS “loss of e-mails”, the padding of jobs and economic reports, and other issues from the current administration I can see why people would easily embrace the word “fabrication” when looking at NOAA/NCDC data. I get it. Expecting it because much of the rest of the government has issues doesn’t make it true though.

What is really going on is that the FILNET algorithm, design to fix a few stations that might be missing some data in the final analysis is running a wholesale infill on early incomplete data, which NCDC pushes out to their FTP site. The process gets to be less and less as the month goes on, as more data comes in.

But over time, observers have been less inclined to produce reports, and attrition in both the USHCN and and the co-op network is something that I’ve known about for quite some time having spoken with hundreds of observers. Many of the observers are older people and some of the attrition is due to age, infirmity, and death. You can see what I’m speaking of my looking through the quarterly NOAA co-op newsletter seen here: http://www.nws.noaa.gov/om/coop/coop_newsletter.htm

NOAA often has trouble finding new observers to take the place of the ones they have lost, and so, it isn’t a surprise that over time we would see the number missing data points rise. Another factor is technology many observers I spoke with wonder why they still even do the job when we have computers and electronics that can do the job faster. I explained to them that their work is important because automation can never replace the human touch. I always thank them for their work.

The downside is that the USHCN and is a very imperfect and heterogeneous network and will remain so; it isn’t “fixable” at an operational level, so statistical fixes are resorted to. That has both good and bad influences.

The newly commissioned USCRN will solve that with its new data gathering system, some of its first data is now online for the public.

USCRN_avg_temp_Jan2004-April2014

Source: NCDC National Temperature Index time series plotter

Since this is a VERY LONG post, it will be continued…in part 2

In part 2 I’ll talk about things that we disagree on and the things we can find a common ground on.

Part 2 is now online here.

Advertisements

  Subscribe  
newest oldest most voted
Notify of
copernicus34

Thanks Mr Watts, important

Interesting post, and I look forward to part 2 in the near future.
I don’t understand a few things and hope you will address it in the future post. First, I don’t understand how the data sets always seem to make adjustments that warm the present and cool the past. In fact, I don’t understand the idea of changing the past (say 1940 or whatever) at all. Unless we have time machines and go back to read the thermometers better, why these changes?
The other big thing I fail to understand is reporting the temperature to hundredths of a degree. I understand that we humans have trouble reading an old time thermometer much better than to the nearest degree and that the electronic gadgets might report to tenths of a degree (accurately?). Am I wrong on that? If I am right, how can you average a bunch of temps that are only good to the nearest degree and get an answer to the nearest hundredth? Did they change measurement rules since I was in school?
I have a few more questions but those two are the ones that baffle me the most.
— Mark

Latitude

you guys realize you’re arguing over a fraction of a degree……you can’t even see it on a thermometer
..and debating “science” on a metric so small…..no one should really care

Steve McIntyre

Anthony, it looks to me like Goddard’s artifact is almost exactly equivalent in methodology to Marcott’s artifact spike – this is a much more exact comparison than Mann. Marcott’s artifact also arose from data drop-out.
However, rather than conceding the criticism, Marcott et al have failed to issue a corrigendum and their result has been widely cited.

Resourceguy

Thanks

Mark Bofill

Wait, what am I looking at here?
Ah! I remember what that’s called now. Integrity.
Don’t see examples of that all that often in the climate blogosphere.
Thanks Anthony.

NikFromNYC

Goddard willfully sponsors a hostile and utterly reason averse and pure tribal culture on his very high traffic skeptical blog where about a dozen political fanatics are cheerled on by a half dozen tag along crackpots who all pile on anybody who offers constructive criticism. His blog alone is responsible for the continuing and very successful negative stereotyping of mainstream skepticism by a highly funded alarmist PR machine. His overpolitization of climate model skepticism results in a great inertia by harshly alienating mostly liberal academic scientists and big city professionals who also lean left but who might otherwise be open to reason. I live two blocks from NASA GISS above Tom’s Diner, just above the extremely liberal Upper West Side and my main hassle in stating facts and showing official data plots is online extremism being pointed out by Al Gore’s activist crowd along with John Cook’s more sophisticated obfuscation crowd. Goddard’s regular conspiracy theory about CIA drug use to brainwash school kids into shooting incidents in order to disarm conservatives in preparation for concentration camps for conservatives is something skeptics should stop ignoring and start actively shunning. His blog is the crack house of skepticism.

Bloke down the pub

Am I being cynical to suggest that if the adjustments were making the temperature trend appear lower rather than higher, that a new method would have been introduced by now?

NikFromNYC

Backwards emphasis typo: “I explained to them that their work is important because the human touch can never replace automation.”
REPLY: Yes, a sign of my fatigue. Thanks – Anthony

Bloke down the pub

Anthony, you mention the uscrn and say it will fix the problem. Isn’t the number of sites on this network going to be limited? While it should give a much better indication of the true temp trend, won’t it be of limited use for infilling data? Until the ushcn is upgraded to uscrn standards surely it’s better to rely on satellite records?
REPLY: As far as I know, it was never intended to be used for data infilling, only to stand on its own merit. – Anthony

John Slayton

…the human touch can never replace automation.
Au contraire…
: > )
REPLY: Refresh, fixed while you were reading – Anthony

Dougmanxx

You can see plenty of stations that have “-9999” data points in the “raw” data. Those “-9999” data points auto-magically go away in the final. This process is called “infilling”, and yes, it’s completely bogus. I know it’s bogus because those “-9999” data points rarely go away in the “raw” data, indicating to me, that “data” was never actually reported for them, but “data” was created via “infilling”. Does some actual “data” come in late? Sure, but not the vast majority. So creating “data” where there is none is acceptable in science? Sorry, but I’ve lost a lot of respect for this blog if you support THAT kind of nonsense.
REPLY: Some actual data does come in quite late, sometimes months later. Remember that the B-91’s are mailed in. Then they have to be transcribed, error checked, and if possible errors recovered from handwriting mistakes (like a 9 looking like a 4) then they run the whole thing through a sanity check and then post the updated data. Some data never gets sent in until months later. I recall one observer I spoke with who told me he only mailed his B-91’s every 6 months to save on postage. When I pointed out how important his station was (He had no idea he was part of USHCN) he vowed to start doing it every month.
You have to watch very carefully to know which data points are latecomers and which are infills. Sometimes, infills in the final get retracted when actual B-91 form data arrives months late.
-Anthony

James Strom

I believe that adjustment and infilling are unavoidable. For example, after a point it would be irresponsible not to introduce new and more accurate measuring technologies, but then you have the problem of calibrating the old and new records for compatibility. But in each case of recalibration the possibility of error increases.
Steve Goddard has been going on about two different things. One is the quantity of “fabricated” data, which you address here, and the other is the comparison of raw and adjusted temperatures, in which the adjusted temperatures always seem to show a stronger warming trend. It would be useful for you to address this second type of claim; in fact, it would be useful to have a systematic audit of the weather services to ensure that their records and adjustment methods are unbiased.

john robertson

Ok so Goddard is over the top.
Good of you to do a sanity check.
I however doubt the concept of his, Goddards, claims being a weapon to discredit all sceptics.
There is no sceptic central is there?
The PR hacks of the C.C.C will do their normal smears no matter what.
It is all they have left.
WUWT’s evenhanded and honest approach to what little empirical data we have is all that is needed.
Now the graph posted does make my mind boggle.
Goddard exaggerates the corruption. Makes a 3 degree F adjustment appear.
But the correction posted shows an adjustment of 0.8 to 1 F.This is odious enough.
Given the total warming claimed Team IPCC ™ was less than the error bars of this historical data, are any “adjustments justified?

talldave2

Sorry Anthony, you’re wrong on a couple points here.
This claim: “More than 40% of USHCN final station data is now generated from stations which have no thermometer data.” Is utterly bogus.
No, this statement is literally correct, if easy to misread. I have downloaded the files myself, it is a fairly trivial exercise in programming and I found 43% of the final (i.e. latest month) station data has the marker indicating the data was missing. (BTW, their data handling is ridiculously antiquated. It’s easy for me to write file-handling routines for hundreds of files, but non-programmers would probably prefer to get the data in Access or something.)
Can that statement be misinterpreted? Sure. And the UK article does, in fact, misinterpret it to mean “almost half the data is missing.” That’s only true for the last month, of course, not for the data set as a whole.
Now, a few caveats:
1) My guess is that this trend exists at least partly because some results trickle in late.
2) Steve tends to present these things in the least favorable light
OTOH, Steve seems to be correct that there is a warming trend in these adjustmentsand that using anomalies doesn’t work very well because they have essentially cooled the baseline.
Re the first graph: again, this is presenting the evidence in the least favorable light, but it is not wrong. Yes, the fact there is less data in the last point gives it much higher volatility, which should be taken into account, but it’s still interesting.
Obviously you (Anthony) have made tremendous contributions and I respect that you always strive to be very, very clear. At the same time, there’s real smoke here, even if Steve is kind of blowing it in the direction he favors 🙂

talldave2

Some actual data does come in quite late, sometimes months later. Remember that the B-91′s are mailed in
Ah good, I was hoping someone knew this. As I said I suspected it.
For fun I will run the analysis again next month to prove the data is arriving and take a swing at an expected correction rate.

Tom In Indy

Anthony,
Could you also comment on this graph comparing “raw data” to “raw data with 30% of observations randomly removed”?comment image
The implication is that infilling is unnecessary, so why do it in the first place. On the other hand, if the actual missing observations are not random, but have systematic component, then any infilling program should account for the systematic bias. Maybe it does, just asking.
REPLY: See the note about Part 2. I have actual money earning work I must do today -Anthony

Woodshedder

Thanks Anthony. This was exactly what I was looking for.

Its worth noting that the difference between the impact of adjustments found using Goddard’s method (the red line) and the other three methods in the second figure in the post is the effect that Goddard is incorrectly attributing to infilling.
http://wattsupwiththat.files.wordpress.com/2014/05/ushcn-adjustments-by-method-year1.png
It is actually an artifact introduced due to the fact that decreased station data availability in recent years has changed the underlying sampled climatology; e.g. the average elevation and latitude of stations in the network. NCDC’s infilled data tries to avoid this by infilling through adding the long-term station climatology to the spatially weighted average of anomalies from nearby stations.
Of course, if you use anomalies the underlying climatologies are held constant, and you are only looking at changes over time, so it avoids this problem.
As Steve McIntyre pointed out, this is pretty much the exact same issue that Marcott used to generate their spurious blade, as well as the issue that E.M. Smith used to harp on back in the days of the “march of the thermometers” meme.
In fact, infilling has no real effect on the changes in temperatures over time, as shown in Menne et al 2009: http://i81.photobucket.com/albums/j237/hausfath/ScreenShot2014-06-24at80004PM_zps20cf7fe9.png
Again, no one is questioning that there are adjustments in the USHCN record. These adjustments for things like TOBs and MMTS transitions are responding to real biases, but they way they are implemented is a valid point of discussion. Unfortunately Goddard is just confusing the arguement by using an inappropriate method that conflates changes in station composition with adjustments.

Steve McIntyre says: June 25, 2014 at 10:34 am
“Anthony, it looks to me like Goddard’s artifact is almost exactly equivalent in methodology to Marcott’s artifact spike”

Well, here’s something – I think this is a good post, and I think there is a Marcott analogy.
What Goddard does is to try to show the effect of adjustment by averaging a whole lot of adjusted readings and subtracting the average of a lot of raw readings. While many of the adjusted readings correspond to the time/place of raw readings, a lot don’t. The result reflects the extent to which those extra adjusted readings were from places that are naturally warmer/cooler, because of location or season.
It’s like trying to decide the effect of fertilizer by just comparing a treated field with an untreated. The treated field might just be better for other reasons. You have to be careful.
One way of being careful here is to use anomalies. An easy way of showing the problems with Goddard’s methods is to just repeat the arithmetic with longterm raw averages instead of monthly data. If the result reflects the differences between stations in the sets, you’ll get much the same answer. I showed here that you do. That’s not adjustment.
If you use anomalies, then the corresponding longterm averages (of the anomalies) should be near zero. There are no longer big differences in mean station properties to confound the results.
Marcott et al did use anomalies. But because they were dealing with long time periods during which a lot changed, the expected values were no longer near zero near the end of the time. So changing the mix of proxies, as some dropped out, did have a similar effect.
As Anthony says, you can’t always get perfect matching between datasets when you make comparisons. Months will have days missing etc. You have to compromise, carefully, or you’ll have no useful data left. Using anomalies is an important part of that. But sometimes it needs even more.

joelobryan

All this talk of temp data set problems, data quality, and abuse-misuse of methodology is completely lost on all but the expert, it is arcane to the public.
But what will not be arcane empirical data is the when the average family experience multiple relentlessly cold winters with skyrocketing utility bills to stay warm, just so the Alarmists can say they are saving the world from CAGW. That will be the real world data points that folks like Mann, Holdren, and their co-conspirators can’t hide, as the Liberals try to shut down ever more carbon-fueled power plants.

Another simple example of why Goddard’s method will lead you astray:
The graph linked below shows raw temperatures for the US from two source: stations with almost no missing data (< 5 years) and all stations, calculated using Goddard's averaged absolute method and rebaselined to the 1990-2000 period.
http://i81.photobucket.com/albums/j237/hausfath/CompleteandAllStationsUSHCNRaw_zps5074f10e.png
Stations with complete records have higher trends than all stations. This is because declining temperature of all stations in Goddard's approach isn't due to any real cooling, but is simply an artifact of changing underlying climatologies.

JFD

Interesting exposure of how some mathematical averaging methods can give incorrect answers. Thanks, Anthony. Two points:
1. Goddard’s graph does show creeping upward warming bias in the way USCRN handles the raw temperature measurements. This could also be due to using a faulty method as well. If not, then you and the other climate professionals need to support Goddard and find a way to expose this to the public.
2. You use the words “absolute temperature” to mean measured temperature. Absolute temperature has an exact meaning and is expressed in degrees Rankine or degrees Kelvin. If possible, my suggestion is use the words “measured temperature” or perhaps “actual temperature” to avoid miscomprehension.

Latitude

Zeke, if you’re going to use raw temps from NOAA (USHCN)..”Stations with complete records have higher trends than all stations”..you need to explain this
http://stevengoddard.wordpress.com/2014/06/23/noaanasa-dramatically-altered-us-temperatures-after-the-year-2000/

flyfisher

Thanks Anthony. This clears up a lot with the graphs from Goddard. The problem I still have is that when you consider: a. differences in record-keeping/sending spread among hundreds of different sites/people, b. switches to different types of measuring devices, c. conditions/placement differences of each measuring type, d. gridding to interpolate data…I have a very difficult time believing that ANY of the data generated using these methods is worth a damn. The error bars after all is said and done I’d imagine are so large as to make any type of reasonable analysis of temperature patterns utterly useless. As far as I’m concerned I’d have to think that satellite data is the only thing that should be considered since you have fewer ‘cooks contributing to the overall burgoo’.

Dougmanxx

You have to watch very carefully to know which data points are latecomers and which are infills. Sometimes, infills get retracted when data arrives months late.
-Anthony
Helpfully, in the raw station data, they include an “E” character before the “infilled” data. It’s a simple matter to pull out how many instances of this there are, from the raw data. As Talldave2 says above, there are way too many. Our friend NikFromNY posted an example at Goddard’s site which I thought was comical, trying to show how wrong Steve was. He posted 5 years of data from a random site in New York State. He must not have “quality checked” it, because fully 20% of the monthly data was infilled. In the parlance of GISS: estimated. What I might call: made up. And none of that data was recent, so it wasn’t waiting on any kind of “late report”. The record is chock full of just that: made up data. And it’s becoming more and more prevalent as the record gets longer. The record is awful, I’m not sure why anyone claims otherwise. I’m not sure why you would defend it.

jmrSudbury

Here is one example. In Aug 2005, USH00021514 stopped publishing data save two months (June 2006 and April 2007) that have measurements. Save those two same months, the final tavg file has estimates from Aug 2005 until May 2014. What is the point in estimating values beyond 2007? Why not just drop it from the final data set as it is no longer a reporting station?

Jim S

There are applied sciences/engineering that “fill in data” or interpolate data, but those disciplines rely on continued feedback to make adjustments to their assumptions. They are using data and models to solve falsifiable problems. Data is NOT being used to generate/design the actual MODELS themselves. The models in other professions have been validated and verified independently of any given set of data. This is why it is wrong to “fabricate” data if the goal is to create climate models for use in geoengineering (i.e. limit CO2 outputs to X to affect temperature changes of Y, etc.).
It’s like meteorology. If a meteorologist wants to guess tomorrows temperature, fine. But let’s not confuse meteorology/climate science models with engineering models.

more soylent green!

What else is wrong here – phone, paper, transcribed — seriously, what century are we living in? I guess we know the government can’t build websites, so I shouldn’t be so surprised.

talldave2

NCDC’s infilled data tries to avoid this by infilling through adding the long-term station climatology to the spatially weighted average of anomalies from nearby stations.
Which very conveniently adds a warming trend. I smell some fertilizer, all right 🙂
One way of being careful here is to use anomalies.
One way of avoiding detection by anomaly is to cool the baseline. Oh look, you did that! Interesting you use “careful” in the context of justifying adjustments that are adding a confirmation-bias-friendly warming trend that is helping to drive trillion-dollar climate policy. Normally one would be very “careful” to avoid that sort of thing.
an artifact of changing underlying climatologies.
And thanks to the discovery of these “changing underlying climatologies” we just happened to add warming to the 20th century since 2001.

Latitude,
There is a very simple explanation. NASA GISS used to use raw USHCN data. They switched at some point in the last decade to using homogenized USHCN data.
Again, no one is questioning that there are adjustments in the USHCN record. These adjustments for things like TOBs and MMTS transitions are responding to real biases, but they way they are implemented is a valid point of discussion. Unfortunately Goddard is just confusing the argument by using an inappropriate method that conflates changes in station composition with adjustments.

Alf

Time for the truth. Either I quit following Steve’s bog or he is vindicated and his ideas given more prominence

Dougmanxx

So why not put this whole silly exercise in futility to bed? It can be done simply and with information that already exists. Goddard claims that past data is being tampered with. Fine. To show how legitimate the calculations are, from now on simply include the “average temperature” used to calculate the “anomalies” for every year. If there’s nothing to what Goddard says, those “average temperatures” will be the same for the past month-after-month-after-month (understanding that many current ones will change a bit due to “late reports”, as Anthony pointed out to me) and we can all say:”Gosh that Goddard guy nearly had me bamboozled!” Why not? Everyone knows the reasons to use “anomalies”, we get the idea behind showing how temps change in a way that is transparent to changes in recording instruments, siting, etcetcetc… So…. why not include that tiny little bit of information, it must exist, in order for an “anomaly” to be calculated? So: do it. Prove Goddard wrong with facts. Do it, so we can look at the temps for every month in 1957 or 1969 or 1928 or 1936 or any other year and see they are the same today, as they will be 2 years from now. Do that. It’s simple, and it puts to rest any of these arguments.

talldave2,
You seem somewhat confused about exactly what anomalies mean. Using anomalies can preclude calculating absolute temperatures (though averaging absolute temperatures, or even spatially interpolating them, generally does not do a good job of calculating “true” absolute temperatures due to poor sampling of elevation fields and similar factors). Using anomalies generally cannot bias a trend. So claiming that anomalies would somehow hide cooling or exaggerate warming is misguided. Using absolutes can bias a trend if the composition of the station network is changing over time.
As I discussed in a recent Blackboard post, using anomalies ironically reduces global land warming by 50% compared to Goddard’s averaged absolute method: http://rankexploits.com/musings/2014/how-not-to-calculate-temperatures-part-2/

talldave2

Stations with complete records have higher trends than all stations
I don’t think anyone would be surprised by that, as everyone knows we are losing more rural data than urban, and McIntyre showed the urban stations have a higher trend. What’s problematic is that you’re smearing the uncorrected UHI across those lost stations.
Unfortunately Goddard is just confusing the arguement by using an inappropriate method that conflates changes in station composition with adjustments.
The method seems appropriate if the goal is find out what effect those changes in station composition are having on the trend.

Its worth mentioning again that infilling (as done by NCDC) has virtually no effect on the trends in temperatures over time. It only impacts our estimates of absolute temperatures if we calculate them by averaging all the stations together.
Here is the difference between USHCN homogenized data without infilling (TOBs + PHA) and with infilling (TOBs + PHA + FILNET). The differences are miniscule: http://rankexploits.com/musings/wp-content/uploads/2014/06/USHCN-infilled-noninfilled.png

Robert of Ottawa

NikFromNYC, you cannot criticize Goddard for “over-politicizing” global warming. It’s the Warmistas, the Lysenkoists, that have made it a political issue.

talldave2,
Steve McIntyre mentioned just a few posts up that Goddard’s method will result in spurious artifacts when the composition of stations is changing over time. In this particular case, you can avoid most of these artifacts by using spatial gridding, anomalies, or ideally both.

Steve McIntyre

Further to Zeke and Nick Stokes comments above acknowledging the similarity of Goddard’s error to Marcott’s error, there is, of course, a major difference. Marcott’s error occurred in one of the two leading science journals and was not detected by peer reviewers. Even after the error was pointed out, Marcott and associates did not issue a corrigendum or retraction. Worse, because Marcott and associates failed to issue a corrigendum or retraction and because it was accepted just at the IPCC deadline, it was cited on multiple occasions by IPCC AR5 without IPCC reviewers having an opportunity to point out the stupid error.

Latitude

These adjustments for things like TOBs and MMTS transitions are responding to real biases, but they way they are implemented is a valid point of discussion
Zeke, so in your opinion switching from raw data to homogenized data….is justification for changing a cooling trend…into a warming trend
NOAA changed sometime after 2000……1999 was not the dark ages

talldave2

So claiming that anomalies would somehow hide cooling or exaggerate warming is misguided.
This again? http://stevengoddard.wordpress.com/2014/06/24/no-you-dont-want-to-use-anomalies/

Latitude

“Its worth mentioning again that infilling (as done by NCDC) has virtually no effect on the trends in temperatures over time.”
Well of course, and who cares…..they had already cooled to past to show a warming trend that didn’t exist….
…Prior to 2000, they used raw data that showed a cooling trend….
…after 2000, they changed to homogenized data…which instantly showed a past warming trend
After 2000 the past warming trend stopped showing up on the satellite temperature record…..and temps went flat
Take away the change from raw data to homogenized data…..and there’s no warming trend at all

Steve McIntyre

Zeke wrote: “Steve McIntyre mentioned just a few posts up that Goddard’s method will result in spurious artifacts when the composition of stations is changing over time. In this particular case, you can avoid most of these artifacts by using spatial gridding, anomalies, or ideally both.”
Zeke, I commented on Marcott’s method. I didn’t directly comment on Goddard’s method as I haven’t examined it. Based on Anthony’s description, I observed that its artifact spike appeared to arose from a similar phenomenon as Marcott’s. As you observe, there are a variety of ways of allowing for changing station composition. It seems to me that mixed effects methods deal with the statistical problem more directly than anomalies, but in most cases, I doubt that there is much difference.
In Marcott’s case, because he took anomalies at 6000BP and there were only a few modern series, his results were an artifact – a phenomenon that is all too common in Team climate science.

KNR

One of things that marks of AGW has have more religion than science outlook is the endless need to defend someone which has entered its dogma, no matter how poor its quality or even how much it disagrees with reality . The stick and 97% are classical examples of that, both are poor from top to bottom but are are regarded as unquestionable and unchallengeable by the AGW hard core.
In science a challenge is welcome, for that is often how progress is made , remember its ‘critical review ‘ which should be done and questions ,even if they dum ones , are what you expect. Its politics and religion that makes claims to ‘divine unchangeable truth ‘ science should always be willing to ‘prove it’

talldave2

Yes, but that’s a bit like saying “Todd’s often drunker than Jim, let’s have Jim drive us home” — that’s not such a great plan if you already know Jim’s passed out in the corner.
When you don’t use the anomalies, it immediately becomes obvious that the baseline has cooled. When you use the anomalies, it looks like nothing much happened.
Except, again… from the published data, we already know something happened, don’t we? So when you show up and say “ignore the temperatures that were actually measured, use the anomaly!” you can forgive us for hearing it as “pay no attention to that man behind the curtain!”

Gunga Din

The graph above is courtesy of Zeke Hausfather Who co-wrote that blog entry with me.

=====================================================
Small typo below the second graph. Shouldn’t that be Dr. Who? 😎

Steve McIntyre says: June 25, 2014 at 12:58 pm
“Further to Zeke and Nick Stokes comments above acknowledging the similarity of Goddard’s error to Marcott’s error, there is, of course, a major difference. Marcott’s error occurred in one of the two leading science journals and was not detected by peer reviewers.”

There’s another major difference. Marcott’s paper was not about temperatures in the twentieth century. It is obvious that poorly resolved proxies are not adding to our already extensive knowledge there. Marcott’s paper was about temperatures in the Holocene, and there it made a major contribution.
I agree that noise at the end of the series should not have been allowed to cause a distraction. But it does not undermine the paper.

JeffC

So what % of the data is made up ? he says 40% … you say he’s wrong but don’t put out your own number … seems like you don’t like the word “fabricate” … so what ? I don’t like the statistics mumbo jumbo you spout here … I put up with it …
you may be getting alittle full of your nitpicking abilities …
How about getting something published so that we can push back against this AGW nonsense ?
How about you refute Cook with your own study ?
Otherwise WUWT is turning into a niche corner of very intelligent posts that refute study after study but in the end never do a damn thing to stop the AGW crowd … your are preaching to the choir and getting stomped in the press and the publications …
{You could say the same exact thing about “Steve Goddard”. Anthony has in fact published some papers. there was also an inspector general review of NCDC due to his surface stations project. Can you cite anything like that from “Goddard”? -mod}

talldave2

BTW here is McIntyre’s UHI paper, quite elegant in its simplicity. Major league sports doesn’t cause UHI, but is strongly associated with more urban areas.
http://climateaudit.org/2007/08/04/1859/
Hansen 2010 (iirc) tried to rebut this using Google Nightlights, but a quick check revealed it was not precise enough for that use.
And as I recall Tony also later found a warming trend.
Lance makes a great point above — there are undoubtedly real biases that introduce spurious cooling, but there seems to be a much greater incentive to remove those as opposed to those that induce spurious warming — so much so that the warming trend in the past keeps increasing. And in a signal as noisy as this, it’s way too easy to find the trend you want.