Approximately 66% of global surface temperature data consists of estimated values

Summary of GHCN Adjustment-Model Effects on Temperature Data

Guest essay by John Goetz

As the debate over whether or not this year will be the hottest year ever burns on, it is worth revisiting a large part of the data used to make this determination: GHCN v3.

The charts in this post use the dataset downloaded at approximately 2:00 PM on 9/23/2015 from the GHCN FTP Site.

The monthly GHCN V3 temperature record that is used by GISS undergoes an adjustment process after quality-control checks are done. The adjustments that are done are described at a high-level here.

The adjustments are somewhat controversial, because they take presumably raw and accurate data, run it through one or more mathematical models, and produce an estimate of what the temperature might have been given a set of conditions. For example, the time of observation adjustment (TOB) takes a raw data point at, say 7 AM, and produces an estimate of what the temperature might have been at midnight. The skill of that model is nearly impossible to determine on a monthly basis, but it is unlikely to be consistently producing a result that is accurate to the 1/100th degree that is stored in the record.

A simple case in point. The Berlin-Tempel station (GHCN ID 61710384000) began reporting temperatures in January, 1701 and continues to report them today. Through December, 1705 it was the only station in the GHCN record reporting temperatures. Forty-eight of the sixty possible months during that time period reported an unflagged (passed quality-control checks) raw average temperature, and the remaining 12 months reported no temperature. Every one of those 48 months was estimated downward by the adjustment models exactly 0.14 C. In January, 1706 a second station was added to the network – De Bilt (GHCN ID 63306260000). For the next 37 years it reported a valid temperature every month and in most of those months it was the only GHCN station reporting a temperature. The temperature for each one of those months was estimated downward by exactly 0.03 C.

Is it possible that the models skillfully estimated the “correct” temperature at those two stations over the course of forty plus years using just two constants? Anything is possible, but it is highly unlikely.

How Much Raw Data is Available?

The following chart shows the amount of data that is available in the GHCN record for every month from January, 1700 to the present. The y-axis is the number of stations reporting data, so any point on the curve represents the number of measurements reported in the given month. In the chart, the green curve represents the number of raw, unflagged measurements and the purple curve represents the number of estimated measurements. The difference between the green and purple curves represents the number of raw measurements that are not changed by the adjustment models, meaning the difference between the estimated value and raw value is zero. The blue curve at the bottom represents the measurements where an unflagged raw value was discarded by the adjustment models and replaced with an invalid value (represented by -9999). The count of discarded raw data (blue curve) is not included in the total count represented by the green curve.

Number of Monthly Raw and Estimated GHCN Temperatures 1700 - Present
Number of Monthly Raw and Estimated GHCN Temperatures 1700 – Present

The second chart shows the same data as the first, but the start date is set to January 1, 1880. This is the start date for GISS analysis.

Number of Monthly Raw and Estimated GHCN Temperatures 1880 - Present
Number of Monthly Raw and Estimated GHCN Temperatures 1880 – Present

How Much of the Data is Modeled?

In the remainder of this post, “raw data” refers to data that passed the quality-control tests (unflagged). Flagged data is discarded by the models and replaced with an invalid value (-9999).

In the next chart the purple curve represents the percentage of measurements that are estimated (estimated / raw). The blue curve represents the percentage of discarded measurements relative to the raw measurements that were not discarded (discarded / raw). Prior to 1935, approximately 80% of the raw data was changed to an estimate, and from 1935 to 1990 there was a steady decline to about 40% of the data being estimated. In 1990 there was an upward spike to about 55%, followed by a steady decline to the present 30%. The blue curve at the bottom shows that approximately 7% to 8% of the raw data was discarded by the adjustment models, with the exception of a recent spike to 20%. (Yes, the two curves combine oddly enough to look like a silhouette of Homer Simpson on his back snoring.)

Percent Raw GHCN Data Replaced with Estimate or Discarded
Percent Raw GHCN Data Replaced with Estimate or Discarded

The next chart shows the estimate percentages broken out by rural and non-rural (suburban and urban) stations. For most of the record, non-rural stations were estimated more frequently than rural stations. However, over the past 18 years they have had temperatures estimated at approximately the same rate.

Percent Rural and Urban (non-Rural) Raw GHCN Data Replaced with Estimate
Percent Rural and Urban (non-Rural) Raw GHCN Data Replaced with Estimate

The fifth chart shows the average change to the raw value due to the models replacing it with an estimated value. There are two curves shown in the chart. The red curve is the average change when not including measurements where the estimated value was equal to the raw value. It is possible, however, that the adjustment models will produce an estimated value of zero. The blue curve considers this possibility and represents all measurements, including those with no difference between the raw and estimated values. The trend lines for both are shown in the plot, and it is interesting to note that the slopes for both are nearly identical.

Average Change in Degrees C * 100 When Estimate Replaces Raw Data
Average Change in Degrees C * 100 When Estimate Replaces Raw Data

What About the Discarded Data?

Recall that the first two charts showed the number of raw measurements that were removed by the adjustment models (blue curve on both charts). No flags were present in the estimated data to indicate why the raw data were removed. The purple curve in the following chart shows the anomaly of the removed data in degrees C * 100 (1951 – 1980 baseline period). There is a slight upward trend from 1880 through 1948, a large jump upward from 1949 through 1950, and a moderate downward trend from 1951 to present. The blue curve is the number of measurements that were discarded by the models. Caution should be used in over-analyzing this particular chart because no gridding was done in calculating the anomaly, and prior to 1892 only a handful of measurements are represented by that data.

Average Anomaly in Degrees C * 100 of Discarded GHCN Data
Average Anomaly in Degrees C * 100 of Discarded GHCN Data

Conclusion

Overall, from 1880 to the present, approximately 66% of the temperature data in the adjusted GHCN temperature data consists of estimated values produced by adjustment models, while 34% of the data are raw values retained from direct measurements. The rural split is 60% estimated, 40% retained. The non-rural split is 68% estimated, 32% retained. Total non-rural measurements outpace rural measurements by a factor of 3x.

The estimates produced by NOAA for the GHNC data introduce a warming trend of approximately a quarter degree C per century. Those estimates are produced at a slightly higher rate for non-rural stations than rural stations over most of the record. During the first 60 years of the record measurements were estimated at a rate of about 75%, with the rate gradually dropping to 40% in the early 1990s, followed by a brief spike in the rate before resuming the drop to its present level.

Approximately 7% of the raw data is discarded. If this data were included as-is in the final record it would likely introduce a warming component from 1880 to 1950, followed by a cooling component from 1951 to the present.

Epilogue

The amount of estimation and its effects change over time. This is due to the addition of newer data that lengthens time series used as input to the adjustment models. The following chart shows the percentage of measurements that are estimated (purple curves) and percentage of discarded measurements. The darker curves are generated from the data set as of 9/23/2015 (data is complete through 8/2015). The lighter curves are generated from the data set as of 6/27/2014 (data is complete through 5/2014). Clearly, fewer measurements were estimated in the current data set than in the past data set. However, more measurements from the early part of the record were discarded in the current data set.

Percent Raw GHCN Data Replaced with Estimate or Discarded 8/2015 versus 5/2014
Percent Raw GHCN Data Replaced with Estimate or Discarded 8/2015 versus 5/2014

A chart showing the average change to the raw data is not shown, because an overlay is virtually indistinguishable. However, the slope of the estimated data trend produced by the current data set is slightly greater than the past data set (0.0204 versus 0.0195). The reason that the slope of 0.0204 differs from the slope in the fifth chart above (blue curve) is that the comparison end month is May, 2014, whereas the chart above ends with August, 2015.

Note: the title was changed to better reflect the thrust of the article, the original title is now a sub headline. The guest essay line was also added shortly after publication, and a featured image added as the guest author did not provide these normal elements of publication at WUWT – Anthony

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
238 Comments
Inline Feedbacks
View all comments
September 25, 2015 9:27 am

“The next chart shows the estimate percentages broken out by rural and non-rural (suburban and urban) stations. For most of the record, non-rural stations were estimated more frequently than rural stations. However, over the past 18 years they have had temperatures estimated at approximately the same rate.”
If you are using the GHCN metadata for urban and rural STOP!
that data is
A) old
B) wrong
That is why no one uses it.

angech2014
Reply to  Steven Mosher
September 25, 2015 8:24 pm

“That data is old”
You mean real data?
Thank god.
No you mean old modified data which has not got your new adjustments in it.
Oh, well. Was hoping
“Wrong?”
Wrong to use old data?
All data is old , the older the older the better usually
except for modified homogenised rubbish.
One moment you argue for it the next you dismiss it when you have changed it
What fantastic logic.

Reply to  Steven Mosher
September 26, 2015 8:20 pm

You are the CAGW equivalent of Willis, ardent and yet un-schooled….

Reply to  Michael Moon
September 26, 2015 8:24 pm

Harsh, yet justified.
Although actually, Willis might be a little more schooled than Steven, however equally ardent.
In both cases, I’m reminded of the Restoration comedy Puritan character, Zeal of the Land.

September 25, 2015 11:11 am

“Svante Callendar” September 24, 2015 at 6:39 pm:
Back in your box, rover. I wonder sometimes about blogs that use attack dogs to disrupt discussions.
On another thread (June 4, 2015 at 8:29 pm) “harrytwinotter” wrote:
dbstealey,
Back in your box. Don’t you ever tire of the “attack dog” role?

There are several other comments by “harrytwinotter” that post the same “attack dog” and “back in your box” comments. But “Svante Callendar” never replied to my observation.
So how about it, “Svante”? Are you a sockpuppet?

September 25, 2015 2:22 pm

Why is average temperature a meaningful statistic?
If it is meaningful, would a warming of +0.5 degrees C. matter to ordinary people (not climate gamers or politicians)?
If +0.5 degrees C. did matter, would it be good news, or bad news ?
If it was bad news, would humans be able to reverse +0.5 degree of warming?
Until these questions are answered, the collection of average temperature data appears to be mainly a waste of taxpayer’s money.
Debates over the temperature “adjustments” bog down “deniers” in climate minutia, where they will have little influence on the climate change “scam”.
From my own point of view, based on evidence and logic, and speaking on behalf of humans, animals and green plants:
– Slight warming since 1880 is good news.
– More CO2 in the air since 1880 is good news,
– Even more warming in the future would be better news, and
– Even more CO2 in the air in the future would be better news.
Average temperature is not a measurement.
It is a statistic than can be compiled in hundreds of different ways.
No one on Earth lives in the average temperature.
Therefore, no one on Earth should care about the average temperature.
Average temperature is mainly a propaganda tool used by leftists to scare people, with the ultimate goal of gaining political power.
This effort is 99% politics and 1% science.

Walt D.
Reply to  Richard Greene
September 25, 2015 7:07 pm

“Senator Iselin, you need to pick one number and stick to it”.
This effort is 97% politics and 3% science,

prjindigo
September 25, 2015 3:11 pm

Lemme just remind you something important. If you increase the resolution of the model by a factor of 2, then 99% of the data is faked.

Rico L
September 26, 2015 9:47 pm

Vic Reeves: “88.2% of statistics are made up on the spot”. Never a better view on statistics.

October 3, 2015 11:30 am

Reblogged this on Climate Collections and commented:
Outstanding review of GHCN treatment of historical data.
Executive Summary: Overall, from 1880 to the present, approximately 66% of the temperature data in the adjusted GHCN temperature data consists of estimated values produced by adjustment models, while 34% of the data are raw values retained from direct measurements. The rural split is 60% estimated, 40% retained. The non-rural split is 68% estimated, 32% retained. Total non-rural measurements outpace rural measurements by a factor of 3x.
The estimates produced by NOAA for the GHNC data introduce a warming trend of approximately a quarter degree C per century. Those estimates are produced at a slightly higher rate for non-rural stations than rural stations over most of the record. During the first 60 years of the record measurements were estimated at a rate of about 75%, with the rate gradually dropping to 40% in the early 1990s, followed by a brief spike in the rate before resuming the drop to its present level.
Approximately 7% of the raw data is discarded. If this data were included as-is in the final record it would likely introduce a warming component from 1880 to 1950, followed by a cooling component from 1951 to the present.