On 'denying' Hockey Sticks, USHCN data, and all that – part 1

Part 2 is now online here.

One of the things I am often accused of is “denying” the Mann hockey stick. And, by extension, the Romm Hockey stick that Mann seems to embrace with equal fervor.

While I don’t “deny” these things exist, I do dispute their validity as presented, and I’m not alone in that thinking. As many of you know Steve McIntyre and Ross McKitrick, plus many others have extensively debunked statistics that went into the Mann hockey stick showing where errors were made, or in some cases known and simply ignored because it helped “the cause”.

The problem with hockey stick style graphs is that they are visually compelling, eliciting reactions like whoa, there’s something going on there! Yet, oftentimes when you look at the methodology behind the compelling visual you’ll find things like “Mike’s Nature Trick“. The devil is always in the details, and you often have to dig very deep to find that devil.

Just a little over a month ago, this blog commented on the hockey stick shape in the USHCN data set which you can see here:

2014_USHCN_raw-vs-adjusted

The graph above was generated by” Stephen Goddard” on his blog and it generated quite a bit of excitement and attention.

At first glance it looks like something really dramatic happened to the data, but again when you look at those devilish details you find that the visual is simply an artifact of methodology. Different methods clearly give different results and the”hockey stick” disappears when other methods are used.

USHCN-Adjustments-by-Method-Year

The graph above is courtesy of Zeke Hausfather Who co-wrote that blog entry with me. I should note that Zeke and I are sometimes polar opposites when it comes to the surface temperature record. However, in this case we found a point of agreement. That point was that the methodology gave a false hockey stick.

I wrote then:

While Goddard’s code and plot produced a mathematically correct result, the procedure he chose (#1 The All Absolute Approach) comparing absolute raw USHCN data and absolute finalized USHCN data, was not, and it allowed non-climatic differences between the two datasets, likely caused by missing data (late reports) to create the spike artifact in the first four months of 2014 and somewhat overstated the difference between adjusted and raw temperatures by using absolute temperatures rather than anomalies.

Interestingly, “Goddard” replied and comments with a thank you for helping to find the reason for this hockey stick shaped artifact. He wrote:

stevengoddard says:

http://wattsupwiththat.com/2014/05/10/spiking-temperatures-in-the-ushcn-an-artifact-of-late-data-reporting/#comment-1632952  May 10, 2014 at 7:59 am

Anthony,

Thanks for the explanation of what caused the spike.

The simplest approach of averaging all final minus all raw per year which I took shows the average adjustment per station year. More likely the adjustments should go the other direction due to UHI, which has been measured by the NWS as 8F in Phoenix and 4F in NYC.

Lesson learned. It seemed to me that was the end of the issue. Boy, was I wrong.

A couple of weeks later in e-mail Steven Goddard circulated a new graph with a hockey stick shape which you can see below. He wrote to me and a few others on the mailing list this message:

Here is something interesting. Almost half of USHCN data is now completely fake.

Goddard_screenhunter_236-jun-01-15-54

http://stevengoddard.wordpress.com/2014/06/01/more-than-40-of-ushcn-station-data-is-fabricated/

After reading his blog post I realized he had made a critical error and I wrote back an e-mail the following:

This claim: “More than 40% of USHCN final station data is now generated from stations which have no thermometer data.”

Is utterly bogus.

This kind of unsubstantiated claim is why some skeptics get called conspiracy theorists. If you can’t back it up to show that 40% of the USHCN has stopped reporting, then don’t publish it.

What I was objecting to was the claim if 40% of the USHCN network was missing – something I know from my own studies to be a false claim.

He replied back with a new graph and the strawman argument and a new number:

The data is correct.

Since 1990, USHCN has lost about 30% of their stations, but they still report data for all of them. This graph is a count of valid monthly readings in their final and raw data sets.

Goddard_screenhunter_237-jun-01-16-10

The problem  was, I was not disputing the data, I was disputing the claim that 40% of USHCN stations were missing and had “completely fake” data (his words).  I knew that to be wrong. So I replied with a suggestion.

On Sun, Jun 1, 2014 at 5:13 PM, Anthony  wrote:

I have to leave for the rest of the day, but again I suggest you take this post down, or and the very least remove the title word “fabricated” and replace it with “loss” or something similar.
Not knowing what your method is exactly, I don’t know how you arrived at this, but I can tell you that what you plotted and the word “fabricated” don’t go together they way you envision.
Again, we’ve been working on USHCN for years, we would have noticed if that many stations were missing.
Anthony

Later when I returned, I noted a change had been made to Goddard’s blog post. The word “fabrication” remained but made a small change with no mention of it to the claim about stations. Since I had open a new browser window I had the before and after that change which you can see below:

http://wattsupwiththat.files.wordpress.com/2014/06/goddard_before.png

http://wattsupwiththat.files.wordpress.com/2014/06/goddard_after.png

I thought it was rather disingenuous to make that change without noting it, but I started to dig a little deeper and realized that Goddard was doing the same thing he was before when we pointed out the false hockey stick artifact in the USHCN; he was performing a subtraction on raw versus the final data.

I then knew for certain that his methodology wouldn’t hold up under scrutiny, but beyond doing some more private e-mail discussion trying to dissuade him from continuing down that path, I made no blog post or other writings about it.

Four days later, over at Lucias blog “The Blackboard” Zeke Hausfather took note of the issue and wrote this post about it: How not to calculate temperature

Zeke writes:

The blogger Steven Goddard has been on a tear recently, castigating NCDC for making up “97% of warming since 1990″ by infilling missing data with “fake data”. The reality is much more mundane, and the dramatic findings are nothing other than an artifact of Goddard’s flawed methodology.

Goddard made two major errors in his analysis, which produced results showing a large bias due to infilling that doesn’t really exist. First, he is simply averaging absolute temperatures rather than using anomalies. Absolute temperatures work fine if and only if the composition of the station network remains unchanged over time. If the composition does change, you will often find that stations dropping out will result in climatological biases in the network due to differences in elevation and average temperatures that don’t necessarily reflect any real information on month-to-month or year-to-year variability. Lucia covered this well a few years back with a toy model, so I’d suggest people who are still confused about the subject to consult her spherical cow.

His second error is to not use any form of spatial weighting (e.g. gridding) when combining station records. While the USHCN network is fairly well distributed across the U.S., its not perfectly so, and some areas of the country have considerably more stations than others. Not gridding also can exacerbate the effect of station drop-out when the stations that drop out are not randomly distributed.

The way that NCDC, GISS, Hadley, myself, Nick Stokes, Chad, Tamino, Jeff Id/Roman M, and even Anthony Watts (in Fall et al) all calculate temperatures is by taking station data, translating it into anomalies by subtracting the long-term average for each month from each station (e.g. the 1961-1990 mean), assigning each station to a grid cell, averaging the anomalies of all stations in each gridcell for each month, and averaging all gridcells each month weighted by their respective land area. The details differ a bit between each group/person, but they produce largely the same results.

Now again, I’d like to point out that Zeke and I are often polar opposites when it comes to the surface temperature record but I had to agree with him on this point; the methodology created the artifact. In order to properly produce a national temperature gridding must be employed, using the raw data without gridding will create various artifacts.

Spatial interpolation (gridding) for a national average temperature would be required in a constantly changing dataset, such as GHCN/USHCN, no doubt, gridding is a must. For a guaranteed quality dataset, where stations will be kept in the same exposure, producing reliable data, such as the US Climate Reference Network (USCRN), you could in fact use the raw data as a national average and plot it. Since it is free of the issues that gridding solves, it would be meaningful as long as the stations all report, don’t move, aren’t encroached upon, and don’t change sensors- i.e. the design and production goals of USCRN.

Anomalies aren’t necessarily required, they are an option depending on what you want to present. For example NCDC gives an absolute value for the national average temperature in their State of the Climate report each month, they also give a baseline and the departure anomaly from that baseline for both CONUS and Global temperature.

Now let me qualify that by saying that I have known for a long time that NCDC uses in filling of data from surrounding stations as part of the process of producing a national temperature average. I don’t necessarily agree with their methodology as being perfect, but it is a well-known issue and what Goddard discovered was simply a back door way of pointing out that the method exists. It wasn’t news to me and to many others who have followed the issue.

This is why you haven’t seen other prominent people in the climate debate ( Spencer, Curry, McIntyre, Michaels, McKitrick) and even myself make a big deal out of this hockey stick of data difference that Goddard has been pushing. If this were really an important finding you can bet they and yours truly would be talking about it and providing support and analysis.

It’s also important to note that Goddards graph  does not represent a complete loss of data from these stations. The differencing method that Goddard is using detects every missing data point from every station in the network. This could be as simple as one day of data missing in an entire month, or a string of days, or even an entire month which is rare. Almost every station in the USHCN at one time or another is missing some data. One exception might be the station at Mohonk Lake, New York which has a perfect record due to a dedicated observer, but has other problems related to siting.

If we were to throw out an entire month’s worth of observations because one day out of 31 is missing, chances are we’d have no national temperature average at all. So the method was created to fill in missing data from surrounding stations. In theory and in a perfect world this would be a good method, but as we know the world is a messy place, and so the method introduces some additional uncertainty.

The National Cooperative Observer network a.k.a. co-op is a mishmash of widely different stations and equipment. the co-op network is a much larger set of stations than the USHCN. The USHCN is a subset of the larger co-op network comprising some 8000 stations around the United States. Some are stations in Observer’s backyards, or at their farms, some are at government entities like fire stations and Ranger stations, some are electronic ASOS systems at airports. The vast majority of stations are poorly sited as we have documented using the surface station project, by our count 80% of the USHCN as poorly sited stations.  The real problem is with the micro-site issues of the stations. this is something that is not effectively dealt with in any methodology used by NCDC. We’ll have more on that later but I wanted to point out that no matter which data set you look at (NCDC, GISS, HadCRUT, BEST) the problem of station siting bias remains and is not dealt with. for those who don’t know NCDC provides the source data for the other interpretations of the surface temperature record, so they all have it. More on that later, perhaps in another blog post.

When it was first created the co-op network was done entirely on paper forms called B – 91’s. the observer would write down the daily high and low temperatures along with precipitation for each day of the month and then at the end of the month mail it in. An example B-91 form from Mohonk Lake, NY is shown below:

mohonk_lake_b91_image

Not all forms are so well maintained. Some B-91 forms have missing data, which can be due to the observer missing work, having an illness, or simply being lazy:

Marysville_B91

The form above is missing weekends because the secretary at the fire station doesn’t work on weekends and the firefighters aren’t required to fill in for her. I know this having visited this station and I interviewed the people involved.

So, in such an imperfect “you get what you pay for” world of volunteer observers, you know from the get-go that you are going to have missing data, and so, in order to be able to use any of these at all, a method had to be employed to deal with it, and that was infilling of data. This has been a process done for years, long before Goddard “discovered” it.

There was no nefarious intent here, NOAA/NCDC isn’t purposely trying to “fabricate” data as Goddard claims, they are simply trying to be able to figure out a way to make use of it at all.  The word “fabrication” is the wrong word to use, as it implies the data is being plucked out of thin air. It isn’t – it is being gathered from nearby stations and used to create a reasonable estimate. Over short ranges one can reasonably expect daily weather (temperature at least, precip not so much) to be similar assuming the stations are similarly sited and equipped but that’s where another devil in the details exists.

Back when I started the surfacestations project, I noted one long-period well sited station, Orland was in a small sea of bad stations, and that its temperature diverged markedly from its neighbors, like the horrid Marysville Fire station where the MMTS thermometer was directly next to asphalt:

marysville_badsiting[1]

Orland is one of those stations that reports on paper at the end of the month. Marysville (shown above) reported daily using the touch-tone weathercoder, so its data was available by the end of each day.

What happens in the first runs of the NCDC CONUS temperature process is that they end up with mostly the airports ASOS stations and the weathercoder stations. The weathercoder reporting stations tend to be more urban than rural since a lot of observers don’t want to make long distance phone calls. And so in the case of missing station data on early in the month runs, we tend to get a collection of the poorer sited stations. The FILNET process, designed to “fix” missing data goes to work, and starts infilling data.

A lot of the “good” stations don’t get included in the early runs, because the rural observers often opt for a paper form mailed in rather than the touch-tone weathercoder, and those stations have data infilled from many of the nearby ones, “polluting” the data.

And we have shown back in 2012, those stations have a much lower century scale trend than than the majority of stations in the surface network. In fact, by NOAA’s own siting standards, over 80% of the surface network is producing unacceptable data and that data gets blended in.

Steve McIntyre noted that even in good stations like Orland, the data gets “polluted” by the process:

http://climateaudit.org/2009/06/29/orland-ca-and-the-new-adjustments/

So, imagine this going on for hundreds of stations, perhaps even thousands early on in the month.

To the uninitiated observer, this “revelation” by Goddard could look like NCDC is in fact “fabricating” data. Given the sorts of scandals that have happened recently with government data such as the IRS “loss of e-mails”, the padding of jobs and economic reports, and other issues from the current administration I can see why people would easily embrace the word “fabrication” when looking at NOAA/NCDC data. I get it. Expecting it because much of the rest of the government has issues doesn’t make it true though.

What is really going on is that the FILNET algorithm, design to fix a few stations that might be missing some data in the final analysis is running a wholesale infill on early incomplete data, which NCDC pushes out to their FTP site. The process gets to be less and less as the month goes on, as more data comes in.

But over time, observers have been less inclined to produce reports, and attrition in both the USHCN and and the co-op network is something that I’ve known about for quite some time having spoken with hundreds of observers. Many of the observers are older people and some of the attrition is due to age, infirmity, and death. You can see what I’m speaking of my looking through the quarterly NOAA co-op newsletter seen here: http://www.nws.noaa.gov/om/coop/coop_newsletter.htm

NOAA often has trouble finding new observers to take the place of the ones they have lost, and so, it isn’t a surprise that over time we would see the number missing data points rise. Another factor is technology many observers I spoke with wonder why they still even do the job when we have computers and electronics that can do the job faster. I explained to them that their work is important because automation can never replace the human touch. I always thank them for their work.

The downside is that the USHCN and is a very imperfect and heterogeneous network and will remain so; it isn’t “fixable” at an operational level, so statistical fixes are resorted to. That has both good and bad influences.

The newly commissioned USCRN will solve that with its new data gathering system, some of its first data is now online for the public.

USCRN_avg_temp_Jan2004-April2014

Source: NCDC National Temperature Index time series plotter

Since this is a VERY LONG post, it will be continued…in part 2

In part 2 I’ll talk about things that we disagree on and the things we can find a common ground on.

Part 2 is now online here.

5 1 vote
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

174 Comments
Inline Feedbacks
View all comments
David Walton
June 26, 2014 3:43 am

Thank you Anthony.

Nick Stokes
June 26, 2014 3:48 am

charles nelson says: June 26, 2014 at 3:16 am
“1. 2. or 3. ?”

3.

charles nelson
June 26, 2014 4:19 am

Nick Stokes.
Thank you.
How about one paragraph which explains why?

NikFromNYC
June 26, 2014 4:25 am

[snip . . think about it . . mod]

Frank de Jong
June 26, 2014 4:36 am

Anthony,
I can see how the infilling makes sense to have “at least something” to publish, even when many stations haven’t reported in. However, it gives an overly confident representation of the truth. I would say that the proper way of handling missing data is to present a result based on the available (raw) data plus an accompanying (large) error bar.
I’ve made this point before, but I think it’s time that (climate) scientists start to consistently give error bars whenever they are presenting results. Not only does it give important background on how to interpret results, but it also forces anyone wishing to present a result, to start thinking about how errors in their methodology propagate.
In the present case, grid averages could still be calculated, but the “incomplete” months should simply have a larger error bar. It’s probably not too hard to find a first rough estimate for the error bar by comparing historical “early results” with “final results”. If I understand your article correctly, the error bar will probably be larger on the bottom than on the top, as the values usually get lower as more data comes in.
Frank

NikFromNYC
June 26, 2014 4:36 am

“Your comment is awaiting moderation.”
…always…delayed.
Almost as fun as Homeland Security.
Go to the big city, guys.
That’s your homework.
Stop hating some bogeyman. Such posturing jest only makes you foolish and impotent, more so than usual.
I understand Steve Mosher better now. He hates you because you are simply more hateable than nice fossil hippies who throw money at him. Really though, what alternative do you offer? Child rapists and utter crackpots? Yup!
Here today do you do that. You defend the crack house in you own hood, shamefully, mostly unaware, but loudly so.

June 26, 2014 4:37 am

Mods: As my comment that is being held in moderation (I hope, I don’t even see it now) is very similar to charles nelson’s comments I would like to know what got it moderated in the first place. It seems I can make several comments and then all of a sudden they start going to moderation … and just sit there. Why? What words are we not allowed to type here? Thanks for any guidance.
charles nelson I posted a comment comparable with your concerns. There is no way that we should change the past. Regardless of what the warmest troll says about your #3, there is no justification to cool the past over and over and over and over again. It is pure fraud to do so.
[your comment contained the word “fraud” which is why it got caught in moderation. Scam and most of its brethren will do the same, as will NAZI etc. It is an automated process and requires one of us to read it and approve it. Also any post containing “Anthony” will be treated like that as the assumption by the software is that you are specifically addressing our host and so wish a response from him. 99% of the time that isn’t the case but it does slow you down particularly when mods are thin on the ground like now. . .mod]

NikFromNYC
June 26, 2014 4:49 am

[snip . . Nik, calm down. You use swear words, NAZI and a lot of intemperate language which is why you are in moderation. Think a bit about how you are saying things because the content is not an issue. Read the comment rules and try and stick to them, please. . . mod]

Latitude
June 26, 2014 4:53 am

The preceeding public service announcement was brought to you by AstraZeneca…
..makers of Seroquel

Latitude
June 26, 2014 6:00 am

The global warming scare did not start in the year 2000….
We can all agree that the temp history prior to the year 2000…was adjusted down
We can all agree that after the year 2000….temperatures stopped rising
…now, can we all agree that all of global warming is based on a temp history that was adjusted down around the year 2000?
Prior to the year 2000, we were told the temp history was accurate, they based their claim on global warming on that temp history…….can we now say global warming was based on a temp history that needed to be adjusted much later?

ferdberple
June 26, 2014 6:23 am

The implication is that infilling is unnecessary
===========
I also have a problem with infilling. gridding is supposed to resolve the problem with too many or too few stations. so if stations are missing, there should be no reason to infill. gridding should take care of this.
the problem I see with gridding and infilling is whether it is 2-D or 3-D. If you are only considering lat and long of the stations, without considering the elevation, you will introduce bias due to the lapse rate.

JustAnotherPoster
June 26, 2014 6:27 am

@NickStokes
“No record has been destroyed. Unadjusted records are published along with adjusted.”
When GISS or NASA publish headlines, “May Hottest May Ever”
http://www.slate.com/blogs/future_tense/2014/06/18/nasa_may_2014_was_the_hottest_may_in_recorded_history.html
Followed by,,,,
Please note May 2014 GISTEMP LOTI numbers preliminary due to a glitch with Chinese CLIMAT data. Update to follow”
https://twitter.com/ClimateOfGavin
There is no P.S. These are adjusted or estimated temperatures.
Its so disingenuous its unbelievable.

Dave
June 26, 2014 6:33 am

I have been a loyal WUWT reader for years but frankly I have a few requirements that WUWT is no longer meeting. Data is measured values, period. Anthony is straying further and further from this absolute requirement of science.
I further find it depressing how Anthony can produce a long boring response like this and completely miss the point. The point is that all adjustments produce warming, to the point of turning cooling trends into warming trends.
Anthony and Zeke are treating Steve like he is the enemy. Anthony has the most highly visited site in the climate field but has to go around to other blogs to character assassinate a fellow skeptic.
I am very disappointed in WUWT and my visit counts will henceforth show it.
REPLY: Maybe you missed the title and the end note about this long response?
It is unfortunate that you are making a knee-jerk decision before reading part 2 coming up today. In that you’ll see why I’m taking the position I have, what areas we can agree on, and how to solve the issue.
If you don’t want to wait for part 2, then I’d say as a reader you aren’t meeting my requirements. Cheers – Anthony

June 26, 2014 6:37 am

Nick Stokes, Zeke,
Why does the old data keep changing from one day to the next?
http://sunshinehours.wordpress.com/2014/06/22/ushcn-2-5-omg-the-old-data-changes-every-day/
Jan 1998 VA NORFOLK INTL AP USH00446139
The station has no flags for Jan 1998.
On Jun 21 USHCNH said tavg Final was 8.18C
On Jun 22 USHCN said tavg Final was 8.33C
Why … from one day to the next did the data go up .15C? I wonder what it is today?
Why do most of the adjustments go up?

ferdberple
June 26, 2014 6:39 am

2. The graphs are real and they are evidence of scientific malpractice…or
================
there is another alternative:
4. The graphs are real and they are evidence of a mathematical mistake, likely resulting from confirmation bias and the lack of experimental controls.
Human beings are notoriously bad at detecting errors that they create. Especially when the error is in the direction that confirms your subconscious beliefs. Thus, the need for double blind controls in experiments. There is little if any evidence that temperature data has been subject to any such controls.
One of the simplest ways to test for bias in the adjustments is to look for a trend. The adjustments should not introduce a trend. And they should have set off alarm bells with those doing the adjustments if they did.
The simple fact that the adjustments are showing a trend tells me there is a problem.

ferdberple
June 26, 2014 6:46 am

Why do most of the adjustments go up?
=================
that in a nutshell is the problem. if the adjustments are introducing a trend (cooling the past, warming the present), this is a problem.
Adjustments should be neutral (random), with no long term trend. If there is a trend, then this trend should be mathematically eliminated by apportioning the trend back into the adjustments. The trend in the adjustments should be mathematically zero, unless you have a bloody good reason why not. Not simply one that is plausible. It better be set in stone.
Otherwise, there is no way to know if the final result shows a real trend, or a trend due to adjustments.

ferdberple
June 26, 2014 6:55 am

Different methods clearly give different results and the”hockey stick” disappears when other methods are used.
=================
Yet, all 4 methods show show a net warming trend in the adjustments over time. this makes no sense mathematically. adjustment errors should be random because the errors you are correcting should be random, so the adjustments should show no trend.
the fact that all 4 methods show a very similar trend indicates the trend is real and that the adjustments are mathematically faulty.

ferdberple
June 26, 2014 7:05 am

Anthony, your graph USHCN Adjustments by Method, Yearly is very clear.
While it shows that the 4 methods give somewhat different results, all 4 methods show a warming trend in the adjustments. Mathematically this should not be. The adjustments should be neutral over time, due to the random distribution of the errors.
It is the trend in the adjustments that is important. Not the differences between what Steve Goddard calculates and what you calculate, because both of you are showing that the trend in the adjustments exists and it is consistently biased.
I would council that both parties concentrate on this point. That the adjustments should not be showing a trend over time. The fact that one method shows a greater trend than the other is really not a big deal, when all methods are showing that the trend lies in the same direction.

Dougmanxx
June 26, 2014 7:05 am

sunshinehours1 says:
June 26, 2014 at 6:37 am
To answer your question using the June 26th “data”:
On Jun 26 USHCN said tavg Final was 8.34C for VA NORFOLK INTL AP USH00446139
So January 1998 at that station just got .01 degrees warmer in the last 4 days. But there’s nothing wrong with the data….
What a complete farce.

Eliza
June 26, 2014 7:09 am

Wow it appears that in the main, apart from Stokes ect., that most of the postings above are in fact SUPPORTING Goddard, not WUWT. Take note. When you look at the WHOLE picture its obvious that Goddard is correct just look at all the “other” adjustments everywhere BOM, NZ, GISS graphs its all over the place this is just ONE quibble (this whole post is just about that ONE quibble). Anyway I back Goddard 100% his contributions outnumber any other I’ve seen in the ciurrent AGW debate.So far all the current climate data is supporting his contention (no Warming, AGW is Bull####) take note. Take Five LOL.

Owen
June 26, 2014 7:13 am

Owen, if I dare:
(A) Every word that comes from Obama’s mouth regarding the climate is a lie.
You falsely attribute intellectual omnipotence to a mere affirmative action promoted community organizer who is following the policy statements of whole scientific bodies, responsibly.
Nik,
‘ following the policy statements of whole scientific bodies ‘ AHAHAHAHA.
Thanks for the laughs man. Al Gore is your mentor, isn’t he.

Alexej Buergin
June 26, 2014 7:25 am

Nick Stokes says:
June 26, 2014 at 3:48 am
charles nelson says: June 26, 2014 at 3:16 am
“1. 2. or 3. ?”
3.
So what was the average temperature in Reykjavik in 1940?
(And why do you think that the meteorologists from Iceland are very stupid?)

James Strom
June 26, 2014 7:27 am

Steve McIntyre says:
June 25, 2014 at 1:05 pm
I didn’t directly comment on Goddard’s method as I haven’t examined it.
____
Who would be better to examine all this stuff and put the dispute to rest?

June 26, 2014 8:07 am

In my Jun 22 2014 copy of USHCN monthly there were 1218 files so there should be 14,616 monthly values.
For 2013:
11,568 had a non-blank value. – 79%
9,384 did not have an E (for estimated) flag.- 64%
7,374 had no flag at all – 50%
For 1998:
14,208 had a non-blank value.
12,316 did not have an E (for estimated) flag.
6,702 had no flag at all

Eliza
June 26, 2014 8:18 am

Fron Goddards site with the raw data!
The graph is quite accurate. For example in 2013 there were 14,613 final USHCN monthly temperature readings
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2.5/ushcn.tavg.latest.FLs.52i.tar.gz, which were derived from 10,868 raw monthly temperature readings
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2.5/ushcn.tavg.latest.raw.tar.gz
which means there were 34% more final readings than actual underlying data. This year about 40% of the monthly final data is being reported for stations with no corresponding raw data – i.e. fabricated”
What is it you don’t get.
Sincerely