Approximately 92% (or 99%) of USHCN surface temperature data consists of estimated values

An analysis of the U.S. Historical Climatological Network (USHCN) shows that only about 8%-1% (depending on the stage of processing) of the data survives in the climate record as unaltered/estimated data.

Guest essay by John Goetz

A previous post showed that the adjustment models applied to the GHCN data produce estimated values for  approximately 66% of the information supplied to consumers of the data, such as GISS. Because the US data is a relatively large contributor to the volume of GHCN data, this post looks at the effects of adjustment models on the USHCN data. The charts in this post use the data set downloaded at approximately 2:00 PM on 9/25/2015 from the USHCN FTP Site.

According to the USHCN V2.5 readme file: “USHCN version 2.5 is now produced using the same processing system used for GHCN-Monthly version 3. This reprocessing consists of a construction process that assembles the USHCN version 2.5 monthly data in a specific source priority order (one that favors monthly data calculated directly from the latest version of GHCN-Daily), quality controls the data, identifies inhomogeneities and performs adjustments where possible.”

There are three important differences with the GHCN process. First, the USHCN process produces unique output that shows the time-of-observation (TOBs) estimate for each station. Second, USHCN will attempt to estimate values for missing data, a process referred to as infilling. Infilled data, however, is not used by GHCN. The third difference is that the homogenized data for the US stations produced by USHCN differs from the adjusted data for the same US stations produced by GHCN. My conjecture is that this is because the homogenization models for GHCN bring in data across national boundaries whereas those for USHCN do not. This requires further investigation.

Contribution of USHCN to GHCN

In the comments section of the previously referenced post, Tim Ball pointed out that USHCN contributes a disproportionate amount of data to the GHCN data set. The first chart below shows this contribution over time. Note that the US land area (including Alaska and Hawaii) is 6.62% of the total land area on Earth.

Percentage of Reporting GHCN Stations that are USHCN
Percentage of Reporting GHCN Stations that are USHCN

How Much of the Data is Modeled?

The following chart shows the amount of data that is available in the USHCN record for every month from January, 1880 to the present. The y-axis is the number of stations reporting data, so any point on the blue curve represents the number of measurements reported in the given month. In the chart, the red curve represents the number of months in which the monthly average was calculated from incomplete daily temperature records. USHCN will calculate a monthly average with up to nine days missing from the daily record, and flags the month with a lower-case letter, from “a” (1 day missing) to “i” (nine days missing). As can be seen from the curve, approximately 25% of the monthly values were calculated with some daily values missing. The apparently seasonal behavior of the red curve warrants further investigation.

Reporting USHCN Stations
Reporting USHCN Stations

The third chart shows the extent that the adjustment models affect the USHCN data. The blue curve again shows the amount of data that is available in the USHCN record for every month. The purple curve shows the number of measurements each month that are estimated due to TOBs. Approximately 91% of the USHCN has a TOBs estimate. The green curve shows the number of measurements each month that are estimated due to homogenization. This amounts to approximately 99% of the record. As mentioned earlier, the GHCN and USHCN estimates for US data differ. In the case of GHCN, approximately 92% of the US record is estimated.

The red curve is the amount of data that is discarded by a combination of homogenization and GHCN. Occasionally homogenization discards the original data outright and replaces it with an invalid temperature (-9999). More often it discards the data and replaces it with a value computed from surrounding stations. When that happens, the homogenized data is flagged with an “E”. GHCN does not use values flagged in this manner, which is why they are included in the red curve as discarded.

Reporting USHCN Stations and Extent of Estimates
Reporting USHCN Stations and Extent of Estimates

The next chart shows the three sets of data (TOBs, homogenized, discarded) as a percentage of total data reported.

Extent of USHCN Estimates as a Percentage of Reporting Stations
Extent of USHCN Estimates as a Percentage of Reporting Stations

The Effect of the Models

The fifth chart shows the average change to the raw value due to the TOBs adjustment model replacing it with an estimated value. The curve includes all estimates, including the 9% of cases where the TOBs value is equal to the raw data value.

Change to Raw USHCN Value after TOB Estimate
Change to Raw USHCN Value after TOB Estimate

The sixth chart shows the average change to the raw value due to the homogenization model. The curve includes all estimates, including the 1% of cases where the homogenized value is equal to the raw data value.

Change to Raw USHCN Value after Homogenization Estimate
Change to Raw USHCN Value after Homogenization Estimate

Incomplete Months

As described earlier, USHCN will calculate a monthly average if up to nine days worth of data are missing. The following chart shows the percentage of months in the record that are incomplete (red curve) and the percentage of months that are retained after the adjustment models are applied (black curve). It is apparent that incomplete months are not often discarded.

Number of USHCN Monthly Averages Calculated with Incomplete Daily Records
Number of USHCN Monthly Averages Calculated with Incomplete Daily Records

The next chart shows the average number of days that were missing when the month’s daily record was incomplete. After some volatility prior to 1900, the average incomplete month is missing approximately two days of data (6.5%).

Average Number of Days Missing from Incomplete USHCN Monthly Averages
Average Number of Days Missing from Incomplete USHCN Monthly Averages

A Word on Infilling

The USHCN models will produce estimates for some months that are missing, and occasionally replace a month entirely with an estimate if there are too many inhomogeneities. The last chart shows the frequency this occurred in the USHCN record. The blue curve shows the number of non-existent measurements that are estimated by the infilling process. The purple line shows the number of existing measurements that are discarded and replaced by the infilling process. Prior to 1920, the estimation of missing data was a frequent occurrence. Since then, the replacement of existing data has occurred more frequently than estimation of missing data.

Infilled data is not present in the GHCN adjustment estimates.

Amount of USHCN Infilling of Missing Data
Amount of USHCN Infilling of Missing Data

Conclusion

The US accounts for 6.62% of the land area on Earth, but accounts for 39% of the data in the GHCN network. Overall, from 1880 to the present, approximately 99% of the temperature data in the USHCN homogenized output has been estimated (differs from the original raw data). Approximately 92% of the temperature data in the USHCN TOB output has been estimated. The GHCN adjustment models estimate approximately 92% of the US temperatures, but those estimates do not match either the USHCN TOB or homogenized estimates.

The homogenization estimate introduces a positive temperature trend of approximately 0.34 C per century relative to the USHCN raw data. The TOBs estimate introduces a positive temperature trend of approximately 0.16 C per century. These are not additive. The homogenization trend already accounts for the TOBs trend.


Note: A couple of minutes after publication, the subtitle was edited to be more accurate, reflecting a range of percentages in the data.

It should also be noted, that the U.S. Climate Reference Network, designed from the start to be free of the need for ANY adjustment of data, does not show any trend, as I highlighted in June 2015 in this article:  Despite attempts to erase it globally, “the pause” still exists in pristine US surface temperature data

Here is the data plotted from that network:

Of course Tom Karl and Tom Peterson of NOAA/NCDC (now NCEI) never let this USCRN data see the light of day in a public press release or a State of the Climate report for media consumption, it is relegated to a backroom of their website mission and never mentioned. When it comes to claims about hottest year/month/day ever, instead, the highly adjusted, highly uncertain USHCN/GHCN data is what the public sees in these regular communications.

One wonders why NOAA NCDC/NCEI spent millions of dollars to create a state of the art climate network for the United States, and then never uses it to inform the public. Perhaps it might be because it doesn’t give the result they want? – Anthony Watts

5 1 vote
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

192 Comments
Inline Feedbacks
View all comments
September 27, 2015 6:31 pm

With my ERP background, I still cannot for the life of me understand why a transactional approach to temperature readings, stored on proper databases, would not be a worthy research objective. That is: for each measurement, each day, each site, the adjustments are added as separate transactions to the originating ob. This then allows standard database query techniques to be used to see just exactly how the final value was arrived at. A made-up example:
DateTime SiteID Type Value Process and Comment
20150615 08:15:00 704367 RAW 12.6 Obs ex site
20150615 08:15:00 704367 TOBS 0.6 TOBs adjustment V3.09
20150615 08:15:00 704367 HOM1 -0.3 Homogenization V8.45.a Correct for site UHI
20150616 12:00:00 704367 RAW 99999 Missing obs ex site
20150616 12:00:00 704367 INF1 11.9 Infill ex V5.7.12 code average nearest 5 sites
And so on. Database engines are made for this sort of storage and query capability.
Use ’em!

Reply to  Wayne Findley
September 27, 2015 6:45 pm

we use mySQL,
Until you actually try to program this stuff, your ideas are well just ideas.
And even when you do document stuff, people dont look at it
http://berkeleyearth.lbl.gov/stations/173102

Patrick
Reply to  Steven Mosher
September 28, 2015 1:13 am

Then you should be using proper, robust, database engines like MS SQL or even IBM DB/2. People still use the word “SEQUEL” to describe MS SQL. Sequel was developed by Hawker Siddeley and MS had to drop it.

Walt D.
September 27, 2015 8:25 pm

So Anthropological Global Warming does in fact exist. However, it is caused by people manipulating the data and not by CO2 emissions from burning fossil fuels as was previously postulated. Meanwhile the change of actual temperatures, particularly the oceans, which comprise 2/3 rds of the Earth’s surface, have changed very little.

Duster
Reply to  Walt D.
September 27, 2015 9:10 pm

UHI is anthropogenic warming. The effect of agriculture on mesoscale “climate” is anthropogenic. The question not whether it exists but how important it is, and whether CO2 has any influence to speak of. The probable answers are “not very” and “very little” based on geological data.

Reply to  Duster
September 27, 2015 9:22 pm

I think a lot of people agree with you Duster. We have a huge impact on local and regional climate. The construction of a dam affects the micro-climate for many kilometres around the reservoir. Logging, farming, ranching, cities, transportation, electricity generation and transmission, solar farms, wind farms – lots of small affects. But the world is 70% water, just a small amount of land is actually inhabited. So the question is how much are humans affecting climate? I suspect not much, but I have been wrong before.

Brett Keane
Reply to  Duster
September 28, 2015 12:01 am

Think insects, think microbes, if you seek effects….

Walt D.
Reply to  Duster
September 28, 2015 5:07 am

Duster: I agree. Also, the fundamental question of how the CO2 emissions from burning fossil fuels affects the total CO2 in the atmosphere is not well understood. What is known is that regardless of the effect of CO2 on temperature, more CO2 is beneficial.

September 27, 2015 8:57 pm

Approximately 92% of the temperature data in the USHCN TOB output has been estimated.

How much of the TOB data itself has been “estimated”? And has anyone done any analysis on the data? For example if a station reports a reading time of exactly 5.00pm every day for a month then I would think it likely this is not the actual reading time.
Maybe it was generally say 4.50pm through 5.10pm but then again maybe it was more or less randomly spread between morning and evening depending on what was happening on that day for the person responsible for reading it.
I am highly sceptical of the TOBs adjustment, not based on the maths or even the nature of the adjustment but rather because people were doing the readings and people have other priorities than being available at the right time for a daily min-max reading. People, however, don’t necessarily like to document their own inadequacies such as reading at a time that is not according to “policy”.

Mervyn
September 27, 2015 9:59 pm

If almost all temperature data is being adjusted, the U.S. Historical Climatological Network (USHCN) must be relying on inappropriately sited Stevenson Screens used by NOAA’s National Weather Service which breach the Climate Reference Network Site Information Handbook developed by NOAA’s National Climatic Data Centre.
It means that, for starters, the temperature data is inaccurate (just an estimate), and then it is adjusted by further estimates. Yet we know that an estimate applied to an estimate and then subjected to another estimate, simply cannot provide an accurate answer.

Reply to  Mervyn
September 27, 2015 10:48 pm

wrong.

knr
Reply to  Steven Mosher
September 28, 2015 1:55 am

really, so in the same way two wrongs can make a right , is that how it works ?
First guess plus second guess + MAGIC = unquestionable thruth , is that really the algorothm for climate models ?

skeohane
Reply to  Steven Mosher
September 28, 2015 7:10 am

KNR, you nailed it!

Leonard Lane
Reply to  Mervyn
September 27, 2015 11:07 pm

Bingo!

Admad
September 28, 2015 1:29 am

Shouldn’t that be “97%”?

richard verney
September 28, 2015 1:43 am

Whenever, the data is presented, the unadjusted raw data set should be displayed (plotted) alongside the adjusted/homogenised data set, and this would then provide some insight into the error bounds of the adjusted/homogenised data set.

Reply to  richard verney
September 28, 2015 11:46 am

I’ve been working with what is supposedly raw daily data that has been converted from F to C, but with flags indicating the data is suspect. Missing data (never logged) is indicated with -9999.
The flags indicating suspect data usually make sense. When you see a summertime temperature of -44°F in Arizona you know something ain’t right. But when you cull both missing and suspect data, you have a lot of missing days.
Flagged data has one of these flags:
A = failed accumulation total check
D = failed duplicate check
G = failed gap check
I = failed internal consistency check
K = failed streak/frequent-value check
M = failed megaconsistency check
N = failed naught check
O = failed climatological outlier check
R = failed lagged range check
S = failed spatial consistency check
T = failed temporal consistency check
W = temperature too warm for snow
X = failed bounds check
http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/papers/durre-menne-etal2010.pdf

Lino D'Ischia
Reply to  verdeviewer
September 29, 2015 9:33 am

Isn’t the whole point of the MMST to avoid “missing” days and “suspect data”? If you get a reading, as per the example you gave, of -44 degrees F, then something is wrong with the transmission system or the actual measuring device itself. Why, then trust the other days? What if the other days the temperature is only slightly off, and falls outside of any correction algorithm? What then? Humans would actually be better than the instruments on those occasions.

richard verney
September 28, 2015 1:45 am

GWPF (and others) have suggested that an enquiry is being undertaken into the various temperature data sets and how these are put together and the adjustments made. The recent articles on WUWT should be presented to them for their consideration since these contain much insight.

sergeiMK
Reply to  richard verney
September 28, 2015 4:57 am

richard verney says:
September 28, 2015 at 1:45 am
GWPF (and others) have suggested that an enquiry is being undertaken into the various temperature data sets and how these are put together and the adjustments made. The recent articles on WUWT should be presented to them for their consideration since these contain much insight.
———————————
doesn’t seem any point sending them stuff – seems they may have destroyed their own credibility:
http://moyhu.blogspot.co.uk/2015/09/gwpf-wimps-out.html
I don’t think any credible scientist would object to scrutiny of data sets and adjustments made providing it was done by indepentant scientists who understand the physics.

Editor
September 28, 2015 3:27 am

And no mention of UHI?

Srga
September 28, 2015 3:42 am

Physics students in schools during the 1980’s were taught that mercury in glass thermometers would be precalibrated against a constant volume gas thermometer prior to use in a critical situation, for example, a weather station. Such thermometers came with a higher price, but not too much compared to the total price of the station. Any change in the thermometers characteristics over time due to the supercooled liquid nature of class could be accounted for by careful study. From first principles I would say that the ageing would cause the thermometers to under indicate the temperature due to the bulb expanding over time. I was always taught that the constant volume gas thermometer is used to calibrate all the others. The difficulty for me in understanding the adjustments from historical data is the direction of the adjustment and the lack of trust in the scientists then who made the readings, mostly with vernier scales to obtain an extra decimal point.

Groty
September 28, 2015 6:44 am

John Hinderaker at Powerline just referenced this post. A thread currently exists at “memeorandum” for both Hinderaker’s post and this post. Hinderaker used a line in his post similar to the one below that you used in this post:
>> Note that the US land area (including Alaska and Hawaii) is 6.62% of the total land area on Earth.<<
Is this true? The world is 71% water and 29% land. I assume the U.S. is 6.62% of the Earth's total mass (including oceans), not just it's land mass. Since this post is getting much attention I hope that data is accurate.

richard verney
Reply to  Groty
September 28, 2015 7:44 am

Since we are dealing with the land thermometer record, I took the 6.62% figure to mean 6.2% of the land surface area of the globe, not 6.2% of the total surface area of the globe.
A very quick internet search suggests that in sq kms, the Earth’s surface area is about 510,072,000 sq. km, the land surface area to be about 148,940,000 sq.km, and the surface area of the US (which I assume includes the Great Lakes) to be about 9,147,400 sq. km.
So a figure of 6.2% appears to be a reference to the land surface area of the US in relation to the land surface area of the globe.

Groty
Reply to  richard verney
September 28, 2015 8:16 am

Yes, thanks. I wish I had Googled before posting my comment. After I posted the comment and did a couple of other things it was bothering me so I decided to dig into it. What I found is similar to what you found. I came back to clarify. Thanks again.

September 28, 2015 8:10 am

The collection and averaging of local temperature “data” by central governments, or at a global level, appears to be a complete waste of taxpayers’ money.
Average temperature is a meaningless statistic.
If there was evidence of significant harm to humans, animals or plants from climate change, and there is none so far, it MIGHT be useful to compile average temperature statistics. I’m not sure why.
And while I’m in a good mood, any Pope who encourages MORE poverty by opposing capitalism, opposing the use of cheap high density sources of energy, which poor people desperately need, and thinks CO2 is a satanic gas, in spite of the fact that it greens the Earth … must hate poor people.

James at 48
September 28, 2015 9:42 am

I estimate that is was really, really, really cold, back in the good old days, after all, I and everyone else used to walk to school 10 miles in a blizzard. Meanwhile, we know it is burning, scorching, crazy hot now. After all, Dakota James said so. As we can see, whereas, in the good old days, people wore nice neat 3 piece suits, long coats and hats, now, all we see are puke rock playing, underwear as outerclothes wearing waifs, who are mere moments away from becoming climate refugees. It will happen, and did happen, in 1997. Lake Michigan is now dried up, and there are now horse races in Antarctica. Oh jees, I need to adjust my meds! / sarc.

James at 48
September 28, 2015 9:42 am

Stupid massive thumbs I have for fingers … I estimate that iT was really, really cold …

September 28, 2015 11:29 am

The US accounts for 6.62% of the land area on Earth, but accounts for 39% of the data in the GHCN network.

Of the 96,191 stations in the GHCND-Stations list as of 1/2015, 53% are in CONUS and Alaska.
61% are in CONUS, Canada, and Mexico.
I’m wondering how it can be that 61% of the data comes from 47% of the stations. Is this due primarily to the long duration of European stations?

Reply to  verdeviewer
September 28, 2015 11:30 am

61% are in CONUS, Alaska, Canada, and Mexico.

HonoredCitizen
September 28, 2015 11:39 am

Even if… and that’s a mighty big IF… that so-called “global warming” turns out to be true, and all those Ph.D.-level scientists turn out to be right instead of deliberately putting on a hoax for the fun of it… even if they’re right and global climate change really is happening… the way nice white people live has absolutely nothing to do with it. Absolutely nothing! So let’s all keep driving alone to the apocalypse. Our grandchildren will curse us for being idiots, but what the hey! At least we tried our darndest to protect the lil baby fetus!

Lino D'Ischia
September 28, 2015 12:28 pm

Looking around and reading some here and there, here’s what I see:
When they changed the Tobs in the 1940’s onward, the raw data doesn’t show much of a glitch. If this is supposed to introduce a “cooling” effect, then why don’t the raw data reflect this? Makes no sense. But, maybe there really wasn’t a “cooling” effect. Maybe that’s the real answer.
The raw data doesn’t show much of glitch throughout for the entire 20th century. But here are two very curious observations:
(1) when stations were switched to MMTS from LiG, the number of months where data wasn’t collected increased? How can that be? This is supposed to be automatic, not human. Why the missing data?
(2) the amount of “warming” that has occurred over the last century is of the same measure as the “cooling” correction of MMTS over LiG, around 0.5 degrees. IOW, when you start trying to include such “correction” via a computer, you’re playing with a figure that is equal to the variable you’re trying to gauge itself. One has to be extremely careful.
Finally, related to point #1: how, exactly, is the MMTS temperature “recorded”? Is there some sort of electric wire that connects all of this up to some computer? Well, what about this wire? Is it being taken into consideration?
First, is the lack of connectivity the reason for all the missing days? If so, then one wonders what you’re really dealing with.
Second: temperature affects the conductivity of metals. Colder days should produce ‘less’ of a signal, and warmer days ‘more’ of a signal: IOW, colder temperatures on cold days, and warmer temperature on warm days. If you have more stations in warmer areas of the country, then this will skew your temperatures.
From the data I’ve seen, once the MMTS went into effect, all hell broke loose. This needs scrutiny. Otherwise: “junk in; junk out.”

Reply to  Lino D'Ischia
September 28, 2015 6:01 pm

There is a limit to how long the cord is allowed to be, which has resulted in MMTS stations being closer to buildings than the Stevenson Screens they replaced.
For one thing.

Psalmon
September 28, 2015 5:34 pm

When Tony Heller exposed this same information, Anthony Watts trashed him in the media. Just when it first hit the light of day and Drudge picked it up, Anthony became the poster child for the climate lunatics to discredit the premise. At that point the momentum collapsed.
Anthony you planning to come out and fess up that you were wrong about that, that Tony was right? It would be the honest and manly thing to do.

kenny moore
September 30, 2015 4:35 am

Professing themselves to be Wise,they became Fools.Romans,KJV,Apostle Paul>well said,Pastor Paul,😊😁😃😉👆👀!!!

runefardal
October 1, 2015 1:18 am

Climate has turned into a religion….because of the money. Hard scientific facts like this article is facts, not believes.
There is of no scientific interest how many scientists supporting a theory. A theory is leading as long as it can best explain the relevant observations, and one experiment is enough to disprove the theory. Basic skepticism about the current theories is thus a major driving force in the development of science.

October 3, 2015 12:46 pm

Reblogged this on Climate Collections and commented:
Further analysis of the extent USHCN data adjustment.
Executive Summary:
The US accounts for 6.62% of the land area on Earth, but accounts for 39% of the data in the GHCN network. Overall, from 1880 to the present, approximately 99% of the temperature data in the USHCN homogenized output has been estimated (differs from the original raw data). Approximately 92% of the temperature data in the USHCN TOB output has been estimated. The GHCN adjustment models estimate approximately 92% of the US temperatures, but those estimates do not match either the USHCN TOB or homogenized estimates.
The homogenization estimate introduces a positive temperature trend of approximately 0.34 C per century relative to the USHCN raw data. The TOBs estimate introduces a positive temperature trend of approximately 0.16 C per century. These are not additive. The homogenization trend already accounts for the TOBs trend.

Verified by MonsterInsights