Metadata fail: 230 GHCN land stations actually in the water

Why is this important? Well if you are calculating UHI for stations by looking at satellite images of nightlights, like GISS does (see my post on it at CA) , you’ll find that there’s generally no city lights in the water, leading you to think you’ve got no urbanization around the station. Using only 10 lines of code, Steve Mosher finds 230 errors in NCDC’s Global Historical Climatological Network (GHCN) data that places the station over water, when it should be on land. Does this affect the calculation of Earth’s surface temperature? Steve Mosher investigates. – Anthony

Wetbulb Temperature

by Steven Mosher

click to enlarge

This google map display is just one of 230 GHCN stations that is located in the water. After finding  instances of this phenomena over and over, it seemed an easy thing to find and analyze all such cases in GHCN. The issue matters for a two reasons:

  1. In my temperature analysis program I use a land/water mask to isolate land temperatures from sea temperatures and to weight the temperatures by the land area. An area that would be zero in the ocean, of course.
  2. Hansen2010 uses nightlights based on station location and in most cases the lights at a coastal location are brighter than those off shore. Although I have seen “blooming” even in radiance calibrated lights such that “water pixels” do on occasion have lights on them.

The process of finding “wet stations” is trivial in the “raster” package of R. All that is needed is high resolution land/sea mask. In my previous work, I used a ¼ degree base map. ¼ degree is roughly 25km at the equator.  I was able to find a 1km land mask used by satellites. That data is read in one line of code, and then it is simple matter to determine which stations are “wet”. Since NCDC is updating the GHCN V3 inventory I have alerted them to the problem and will, of course provide the code. I have yet to write NASA GISS. Since H2010 is already in the publishing process, I’m unsure of the correct path forward.

Looking through the 230 cases is not that difficult. It’s just time consuming.  We can identify several types of case: Atolls, Islands, and coastal locations. It’s also possible to put the correct locations in for some stations by referencing either WMO publications or other inventories which have better accuracy than either GHCN or GISS. We can also note that in some cases the “mislocation” may not matter to nightlights.  These are cases where you see no lights whatsover withing the  1/2 degree grid that I show. In the google maps presented below, I’ll show a sampling of all 230. The blue cross shows the GHCN station location and the contour lines show the contour of the nightlights raster. Pitch black locations have no contour.

I will also update this with a newer version of Nighlights. A google tour is available for folks who want it. The code is trivial and I can cover that if folks find it interesting. with the exception of the graphing it is as simple as this:

Ghcn<-readV2Inv() # read in the inventory
lonLat <- data.frame(Ghcn$Lon,Ghcn$Lat)
Nlight <- raster(hiResNightlights)
extent(Nlight)<-c(-180,180,-90,90) # fix the metadata error in nightlights
Ghcn<-cbind(Ghcn,Lights=extract(Nlight,lonLat)) # extract the lights using “points”
distCoast <-raster(coastDistanceFile,varname=”dst”) # get the special land mask
Ghcn <- cbind(Ghcn,CoastDistance=extract(distCoast,lonLat))
# for this mask, Water pixels are coded by their distance from land. All land pixels are 0
# make an inventory of just those land stations that appear in the water.
wetBulb <- Ghcn[which(Ghcn$CoastDistance>0),]
writeKml(wetBulb,outfile=”wetBulb”,tourname=”Wetstations”)

Some shots from the gallery. The 1km land/water mask is very accurate. You might notice one or two stations actually on land. Nightlights is less accurate, something H2010 does not recognize. Its pixels can be over 1km off true position. The small sample below should show the various cases. No attempt is made to ascertain if this causes an issue for identification of rural/urban categories. As it stands the inaccuracies in Nightlights and station locations suggests more work before that effort is taken up.

Click to enlarge images:

Advertisements

  Subscribe  
newest oldest most voted
Notify of
Roger Knights

Let me say it first!
“All Wet!”

kim

Can we say they are lazy? Maybe they were processing their data in a dark alley.
==============

Pingo

They are all at sea!

a jones

Mr. Mosher, you progress sir, you progress. In time we shall learn much I think from your worthy efforts. Still clearly there is much more to do, But it is a start.
The only thing that puzzles me is how the professionals, if you can call them that, did not appear to be aware of this. It may of course be that they are, but believe that their statistical analysis copes with these problems. Perhaps, but if so why have they never even referred to this problem? If they knew it existed one might expect they would explain their methods of dealing with it.
Is that too much to ask? They have been doing this kind of thing for decades yet it takes one amateur, I hope you do not take exception to that description Mr. Mosher, Natural Philosophy has a long and honourable tradition of relying on the well informed amateur, where indeed would our astronomical colleagues be without their army of amateur star gazers? but just one such amateur working by himself to discover these flaws in a relatively short time.
Does that not strike you as strange? Perhaps not in the Topsy Turvy world of climatology where it seems it is the amateurs are doing the serious work that the professionals have omitted to do.
As for the answers you will get from the experts Mr. Mosher I keenly await them, but somehow suspect we may have to wait for a very long time indeed.
Kindest Regards

Nightlights over water can be fishing boats or aerosols dispersing city light vertically. The instrument is sensitive enough to detect both. Or indeed, they can be calibration and/or registration errors, or blooming/bleeding.

juanslayton

Perhaps I may be forgiven for repeating a comment I left earlier over at Moshtemp:
Mr Mosher,
I had a great vacation last year prowling the west and documenting USHCN stations. Occurred to me belatedly that I was driving right past GHCN stations without getting them. So I started to include them just in case Anthony ever makes good on his threat to extend the Surface Stations gallery. I only have a couple so far, but I’ll be glad to share the on-site numbers now and in the future if it would be useful.
Oregon’s Sexton Summit shows typical creativity in coordinates. GISS has the station about a mile and a half to the north, MMS is much closer, maybe 300 feet to the east, and my Garmand (at 42.600131, -123.365567) is about right on the satellite photo.
GISS site descriptions can be just as creative as their coordinates. I got a chuckle this afternoon to see that they put El Centro, California, in an area of ‘highland shrub.’ Locals know that El Centro is in the Salton Sink, 39 feet BELOW SEA LEVEL. Ah, well….

A trillion pounds is being spent based on data which so lacks any quality control that it makes infant school artwork look “precise”.
And then they have the gall to suggest that being a sceptic is anti-scientific!

chu

It’s worse than we thought, undetected massive sea level rises.

orkneygal

230 Stations.
Is that a lot?

rc

My respect for Hansen grows and grows.
Oops I used a trick there to hide the decline.

These are good results. Improvement in the the surface measurements are needed. While metadata mining is not my specialty, I am glad someone works that side of it. I think there is a benefit from the surface data, but it is limited in scope and has pesky problems like the UHI and the even simpler fact that there are thousands of separate thermometers in different locations. Calibration and location are just two issues.
A better standard for what the temperature is will greatly benefit everyone. My workaround for using a single set of temperature data is to blend 4 sources of data together. This blended data has some nice advantages. It gives coverage of the entire instrumental period, but also includes the superior satellite data. I even have some AGW fold agree that this is a good method.
Cleaning up the modern data is good, but the real battle is with CO2. This is where I really focus my effort. My current bit of trouble is to discredit climate sensitivity calculations based on CO2 changes.
We will win as the actual science is on our side, but a solid understanding of the science will be needed.
John Kehr
The Inconvenient Skeptic

John Marshall

This may all be meaningless because, as some physicists say, temperature can only be taken if the system is at equilibrium. The atmosphere is never at equilibrium so any temperature taken is meaningless.
What we are arguing about is a few tenths of a degree divergence from what someone has calculated as an average, so giving the anomaly that the graphs show, but this so called average may not be correct because it is calculated over too short a period of time.
Climates change, get used to it.

Roger Carr

chu says: (November 8, 2010 at 2:20 am) It’s worse than we thought, undetected massive sea level rises.
Beautiful, Chu! A classic.

Patrick Davis

“orkneygal says:
November 8, 2010 at 2:24 am”
It’s a lot of bad data, that’s for sure. But they thought they could get away with sloppy/shonky work, and to a large extent, they have. In no other industry do I know of such sloppiness that is tollerated. If my engineering work handn’t turned out within the +/- 2 micron specifications, I’d have been saked.

husten

I suppose it is not clear in all cases where what the coordinates in the metadata are referenced to. Nasa-Giss et.al might not have cared much about that. Google Earth uses WGS84, also the default setting at many hobby GPS devices. I have no idea if there is a worldwide standard for stations, possibly for airports???
This issue can account for a few 1000’s of feet in extreme cases.

Latimer Alder

The more I learn about the way that temperature data is collected, the less I am convinced that AW theory is built on a sound experimental footing at all.
Somebody please correct my broad brush understanding if it is wrong.
1. The whole subject relies 100% on the observation of daily temperatures around the world and the computation of an average temperature based on the just the maxima and minima of the daily readings.
Q: This seems like a very kindergarten way of computing a meaningful number. I can understand the arithmetic, (a+b/2) but is it the best statistical technique available? And does the mean so computed actually have any meaning to plant growth, sea levels, migration of birds etc etc. Has there been any experimental work to show that it does?
2. There is no universally accepted and adhered to way of observing these temperatures. Stations are built to different standards, placed in non-comparable sites, move, disappear, are modified, use different equipment, use different recalibration routines (if any) and are not subject to any regular systematic check of their validity or accuracy. The data is a complete hodge podge and the histories are largely unknown or unrecorded.
3. The purpose of the measurements is to record the daily maxima and minima as noted above. It is easy to imagine circumstances where the recorded temperature will be artificially high…our world is full of portable heat sources…but very difficult to imagine those that make it artificially low. Our world is not full of portable cooling fans. To my mind there is an inherent bias for the recordings to be higher overall than the truth.
4. Many have written eloquently as well about the heat island effect where the progress of urbanisation over the last hundred years or so has meant that previously undisturbed instruments have become progressively more subject to extraneous heat sources ..and so show higher readings. As urbanisation proceeds, the number of undisturbed stations decreases and the heat affected ones increase…once more biasing the record towards higher recordings over time.
5. Once all the data has been collected in the inconsistent and error prone manner shown above, it is sent to, for example, our friends at CRU in Norwich. They then apply ‘adjustments’ to the data based on their ‘skill and knowledge’. The methodology for the adjustments is not published… and may not be kept at all because of the lack of any professional data archiving mechanisms at this place.
The director of CRU does not believe that the UHI effect is of any significance, because a colleague in China produced some data sometime that showed that in China .. in the middle of great political upheavals.. it was only small. And cannot now produce that data once again…maybe he lost it in an office move.
He is (was) also a highly active and influential member of the community whose careers have been made by ‘finding’ global warming and actively proselytising its supposedly damaging effects. Any guesses as to which directions his ‘adjustments’ are likely to be in?
CRU is institutionally averse to any outside scrutiny. Resisting such examination by any and all means is in its DNA. Their self-declared philosophy is to destroy data rather than reveal it.
(Personal note – As an IT manager, losing any data is one of the greatest sins that I could commit…actively destroying it is a mortal sin, and under FOI may well be legally criminal as well)
6. The data so processed is then released to the world as the definitive record of the temperature series. And used as the basis for all sorts of predictions, models, scare stories, cap’n’trade bills and all that malarkey. Hmmmmmm!
Forgive me if I think that this overall process has not been ‘designed’ to get at the absolute truth. The data is collected in an inconsistent manner, and there is no reliable record that describes the exact circumstances of that collection. It is subject to many outside influences that tend to increase the readings over time, not to decrease them.
Once collected it is subject to adjustments according to no agreed or public methodology. There is a clear potential conflict of interest in the institutions making the adjustments.
I set as a task to first year science undergraduates…take this model of raw data collection and processing to the homogenised state, and suggest how it could be better designed to get at the truth of the temperatures. Do not feel shy about writing at length. There is plenty of scope.

H.R.

Hmm… so those 230 stations should be reclassified as wet bulb readings, eh?

Robin Guenier

O/T (slightly). In an article in American Thinker (here), Fred Singer says:

The global climate … warmed between 1910 and 1940 but due to natural causes, and at a time when the level of atmospheric greenhouse gases was relatively low. There is little dispute about the reality of this rise in temperature and about the subsequent cooling from 1940 to 1975, which was also seen in proxy records (such as ice cores, tree rings, etc.) independent of thermometers. The … IPCC, then reports a sudden climate jump around 1977-1978, followed by a steady increase in temperature until at least 1997. It is this steady increase that is in doubt; it cannot be seen in the proxy records.

Is that claim (re proxy records) correct? Has the evidence for it been published?
He goes on to say:

Even more important, weather satellite data, which furnish the best global temperature data for the atmosphere, show essentially no warming between 1979 and 1997.

Again, is that correct?

R. de Haan

Is there no way to hold these people accountable for their ill performed work?

jimmi

“Even more important, weather satellite data, which furnish the best global temperature data for the atmosphere, show essentially no warming between 1979 and 1997.
Again, is that correct?”
Well the RSS data was up in a post a few days ago – so, no it is not correct.

John S

Every one of these stations get serviced at least occasionally. How hard would it be to visit every single site over the course of, say, a year, get a GPS based location, and update the database? It could be done with little or no additional cost above the basic maintenance.

John Peter

Robin Guenier at November 8, 2010 at 4:11 am http: should look at
//www.drroyspencer.com/latest-global-temperatures/
There he can see that global Satellite temperatures are essentially “flat” until 1997 and the rises as a result of that 1998 El Nino. We then have another elevated “flat” period from 1998 to present with another peak summer 2010 but declining temperatures again due to La Nina. I have flat in “” as there are natural variations from year to year. As can be seen, the recent peak did not exceed 1998. Since 1998 CO2 has increased by more than 20ppm without essentially sending global temperatures upwards. Sea ice N/S is within natural variation and global sea levels are heading down again.

Amazing (65 N)Canada arctic
automatic weather station
still hourly records
temperatures warmer than
(45 N)NYC JFK station–
http://www.ogimet.com/cgi-bin/gsynres?ind=71981&ano=2010&mes=11&day=8&hora=6&min=0&ndays=30
http://www.ogimet.com/cgi-bin/gsynres?ind=74486&ano=2010&mes=11&day=8&hora=6&min=0&ndays=30
This amazing station also
is consistently 10-30 degrees C
hotter than all its surrounding
Canada arctic stations–
Too hot to be water effects
–it must sit atop
an erupting volcano.

redneck

Latimer Alder: November 8, 2010 at 3:33 am
Latimer you make many good points.
Your statement:
“Q: This seems like a very kindergarten way of computing a meaningful number. I can understand the arithmetic, (a+b/2) but is it the best statistical technique available?”
Well it got me thinking about a quote I read years ago:
“Given enough data with statistics you can prove anything.”
Sorry I don’t know the source.
Now I’m no statistician but from my reading of events it was choosing the best statistic, the one that will give me the answer I want, some unusual variant of PCA that may yet get Mann in some hot water.

What are these people doing while under the employ of my tax money? If it is your job to make sure temperature data is accurate, why aren’t you catching these outrageous errors? The only thing I can figure is the data gives the output desired and they are using their time figuring out to keep their easy income, such as going on personal crusades decrying how bad global warming is.
Why do keep calling people who are too lazy to do required quality control “scientists”? They are more like fat cats.

Dave L

Quick. Somebody notify John Abraham and the AGU. Those pesky skeptics are at it again. (See the preceding article on WUWT.)

@Robin
1) Since 1900, there are warming, cooling and warming periods visible in proxy data, here in glaciers:
http://www.ncdc.noaa.gov/paleo/pubs/oerlemans2005/fig3a.jpg
2) Considering the TLT record, Singer is correct.
http://climexp.knmi.nl/data/itlt_gl_1978:1997a.png
HadCRUT/GISS/global SST record shows warming at the same period, which can be from some part attributable to UHI.

Latimer Alder

Every one of these stations get serviced at least occasionally. How hard would it be to visit every single site over the course of, say, a year, get a GPS based location, and update the database? It could be done with little or no additional cost above the basic maintenance.

Actually getting the raw data would be the easy bit..and your proposal is a fine one.
But the real problem would be to construct the technical, procedural and cultural ‘infrastructure’ to make use of the data so collected.
We have seen in Harry_read_me that CRU for example are absolutely clueless about data archiving and retrieval, that they have no consistent process for handling their ‘adjustments’, that they most definitely do not want any form of outside scrutiny of their work in exchange for their grant money and that their standards of Information Technology disciplines fall far short of those expected of even a talented amateur in the field.
But the most revealing (and worrying) thing that Harry inadvertently revealed is that they are unashamed by this! Charged with keeping one of the three global datasets that may hold the key to ‘the most important problem facing the world’, they are content to bumble along in their shambolic way..occasionally wiping the fag ash and cobwebs off a pile of old Chinese papers just to ensure that they don’t want to let others see them.
They seemingly have never even thought to visit other institutions whose mission is to keep data secure and with meaning. No concept crosses their collective wisdom that others have faced and solved similar problems and that perhaps there are lessons that could be learnt. Nor that their ‘mission; is suffciently important (or so some believe) that they have a professional and social duty to use the highest standards that have been developed…not the lowest.
It will take years, even with a complete change of personnel in such an institution, to get the data they do have into a state where your most helpful suggestion can be fully exploited (which doesn’t mean that a start should not be made).
The changes needed are primarily cultural….to imbue the whole field with the importance of consistent accurate and verifiable data collection. With consistent accurate and verifiable ‘adjustments’ if these prove necessary. With a relentless focus on the data as the only actual truth…not on modelling predictions.
There is a long, long, long way to go. But until we arrive at somewhere much nearer that ideal, everything else that has been done is just castles in the air.

Tenuc

Robin Guenier says:
November 8, 2010 at 4:11 am
“O/T (slightly). In an article in American Thinker (here), Fred Singer says:
Even more important, weather satellite data, which furnish the best global temperature data for the atmosphere, show essentially no warming between 1979 and 1997.
Again, is that correct?”

Yes. RSS satellite data shows no statistically significant global warming 1979-97.
Here’s the data, you can see for yourself.
http://woodfortrees.org/plot/rss/from:1979/to:1997/plot/rss/from:1979/to:1997/trend
Woodfortrees is a great site for checking out what you read on climate issues as you can easily compile your own plots of climate data metrics, link here:-
http://woodfortrees.org/plot/

simpleseekeraftertruth

The fact that 230 stations were found to have coordinates in the sea proves that at least 230 stations are incorrectly assessed for UHI. 230 is the minimum number that are wrong as the method used (land/water masking) only detects those that have this charateristc.

Dave

It’s worse than we though! Look at the evidence right here of extensive recent sea level rises! All those points were classified as land previously, so if they’re not now, it must be due to rising sea level.

juanslayton

John S:
As they switched from relying on interpolating coordinates from maps to direct GPS measurements, they have been updating the database. One complication: rather than correct the existing reported coordinates, they enter the new, corrected coordinates as a station location change, leaving the old coordinates as an apparent previous location. Took me about a year and a half to figure this out; meanwhile I wasted a lot of time and effort trying to run document locations where there had never been a station.
Juan S.

LearDog

Wow. Just wow. Fantastic work Mosh.
And I liked that funny bit about ‘it seemed an easy thing to find and analyze all such cases in GHCN’. Ha ha ha! Cracked me up. I laughed, really did.
Finding and analyzing all cases was so easy in fact – that the folks in charge of the database hadn’t corrected it ? I doubt that. You clearly have a skill set that THEY do not possess.

Viv Evans

Now I understand why these are called ‘wet bulb readings’ ….
😉

Steve Fitzpatrick

Hi Mosh,
Nice work; certainly very interesting.
I have a question: If there is frequent inaccuracy, as your work clearly shows, might we not expect that inaccuracy to indicate that stations nowhere near the water to also suffer significant inaccuracies? That is, doesn’t any random inaccuracy in station location almost automatically imply an understatement of night-light based UHI adjustment? Urban areas are generally rather small, so on average any random inaccuracy would (I think) tend to locate the station further away from the brightest regions; it should not matter the direction of the inaccuracy. Placement of stations over water makes the error obvious, but I wonder how many other significant errors exist where the obvious clue of water vs. land is not available.

juanslayton

Don’t like salt water? Go to Northport, Washington. Both GISS and the MMS will put you into the Columbia. For real location check the Surfacestations.org gallery : > )

chu says: “It’s worse than we thought, undetected massive sea level rises.
Lovely!

Latimer Alder says:
The changes needed are primarily cultural….to imbue the whole field with the importance of consistent accurate and verifiable data collection. With consistent accurate and verifiable ‘adjustments’ if these prove necessary.
I would second that!
Irrespective of which side of the “fence” anyone sits, it ought to be common ground that the quality of these measurements should be first class. When they seem to have nothing else to do, what are we paying for except world class data handling?
Anyone who has ever installed a quality system knows the principle that in a quality system you have as much concern for the smallest/simplest problems such as the position of stations, because if you’re failing with the simple basic things then it is almost certain you are failing with the more complex issues.
We know the present system has no credibility because we know the quality is totally abysmal. That should not be a partisan issue — even if you believe the world is warming due to mankind, you still want to have good accurate data on which to base action.
One of the clearest indicators that global warming is not a serious problem is that we can see that none of the establishments are at all concerned about the abysmal quality of the present compilation of temperature data … poor quality, poor data handling, and a group of “professionals” who seem to spend more time editing wikipedia and real climate than correcting the many and obvious errors in the temperature record! If they don’t care about the temperature record, then why on earth should anyone else?

LearDog

Wow. Just wow. Great chunk of work Mosh. And I loved the line ‘it seemed an easy thing to find and analyze all such cases in GHCN’. Ha ha ha! Yeah, right. I laughed out loud on that one.
So easy in fact – that the professionals in charge of the database and conducting the analysis know about the issue but don’t care ? I don’t think so.
I think that YOU possess a set of skills and intellectual drive that THEY can’t possibly emulate. Amazing stuff here.

jaypan

Sure Abrahams’ “Climate Rapid Response Team” was built to work out such failures.

Bernie

Steven:
What kind of response did you get from NCDC? Were they appreciative of your work? Have they asked for more technical details?

LearDog

Am traveling and don’t have access to my data to address the question: is this merely an issue of precision (x.xx vs x.xxxxxx) in the database or are these systematic / siting problems?

I’m confused. Shouldn’t the headline read: Stations that are shown on water are actually on land? Are they actually in real life on water or land? I’m guessing the coordinates are wrong. It would be nice to know the ramifications of these errors as well. Once they fix the gps (if that is wrong) what would be change in data?

Gary

And this just audits errors in the current sitings. What about previous site locations which may have had much different temperature measurement issues?

Dave says:
November 8, 2010 at 6:14 am
It’s worse than we though! Look at the evidence right here of extensive recent sea level rises! All those points were classified as land previously, so if they’re not now, it must be due to rising sea level.

Not necessarily, Dave. You are making the “warmist” assumption.
It could easily mean that in those areas, the land has been sinking.
LOL

Martin Brumby

A bit O/T but also relevant. Read on….
The Royal Society is apparently having a bunfight to discuss / promote “Geo-engineering” solutions to the awful Irritable Climate Syndrome shock-horror-disaster.
The BBC invited some tame “boffin” onto their resolutely alarmist Today “news” programme this morning. After briefly mentioning some of the dopey solutions that had been suggested to stop us frying (even as we shiver), he was asked what was his personal favourite “Geo-engineering” wheeze. Interestingly, this “solution” didn’t involve mirrors, white paint or artificial volcanoes. Instead he suggested getting farmers to plant crops with shinier leaves. (It is to be hoped that these wouldn’t be products of GM technology…Aaaargh…the horror…).
The interesting point (and the one relevant to this thread) followed, with his estimate that the “shiney” leaf crops could make a difference of one degree to Global Temperatures!
Hmmmmmmmm.
Perhaps that’s what they mean about ‘green jobs’. Well someone has to take a damp cloth and polish those leaves.
But how much of a temperature increase have we seen so far since the start of the Industrial Revolution?
I wondered what allowance the “models” already include for changes in the ‘shineyness’ of vegetation in that period? Would this allowance be more or less than the allowances for UHI effect, dodgy records, the march of the thermometers, CRU tweaks, the end of the Little Ice Age, the effects of oceanic current fluctuations, moose dung next to Yamal trees and all those other exciting little things that we have learned about on here?
Definitively, worse than we thought.

“Meta-Data”, beyond data, beyond the “twilight zone” 🙂

1DandyTroll

No wonder those hippies are all screaming about the looming doom of sea level rise.
Bwaaap, haha, see what I did there?

j ferguson

Steven,
These 230 stations are part of what number of stations in current use by GISS?

Jeff

I would expect many locations on land are mislocated as well and may be improperly classsified as rural or urban depending on the error in location …
I think it is safe to say that nobody has good temperature data including location … and I mean nobody …
We need a to go tabula rossa on this … start over …