Virginia is for warmers? Data says no.

A new tool for climate data visualization is showcased

David Walton writes to tell us about a satirical piece by Mark Steyn – titled Virginia is for Warmers

I was blessedly out of the country on Tuesday, so I’m belatedly catching up on post-election analysis. But I see that self-proclaimed Nobel laureate Michael Mann is hailing Virginia’s gubernatorial race as a referendum on climate science. To scroll his Twitter feed, you’d get the impression “climate change” was the No. 1 electoral issue in the state. Which would tend to suggest that the subject is political speech. Which is protected by the First Amendment, isn’t it?

Dr. Mann is a rare bird: a political activist whose politics require insulation from the Constitution.

About the same time WUWT reader “James” tips us to this story about a data analyst that decided to run some software called SAP HANA and a visualization tool called Lumira on climate data in Virgina in response to a reader challenge on their forum. Both of these tools are for the business community where getting it right has big rewards and getting it wrong has financial consequences.

He’s using data from NOAA’s Integrated Surface Database which has data back to 1901. It has good station coverage in the last 40 years, so should be able to detect a global warming signal during the period. Apparently, there is none in this data, which is absolute temperature, not anomaly data. Below is a screencap of Virginia time series of average temperature from the Lumira program:

Virginia_ISH_temperature_plot

You can read all about the ISH data in this PDF document here. Note that this is hourly data, mainly from airport sources. It doesn’t have the same sort of adjustments applied to it that daily data Tmax/min and Tmean have in the favorite surface climate data indices used by HadCRUT and GISS. For example, since it is hourly data, there is no need for a TOBs (Time of Observation) correction. Readers may recall that Dr. Roy Spencer has used ISH data for some of his analyses in the past.

Part 1 where he converts/ingests data is linked in part2 of the story below, and a video of the entire analysis process follows:

=============================================================

Big Data Geek – Is it getting warmer in Virginia – NOAA Hourly Climate Data – Part 2

By John Appleby

So I discussed loading the data from NOAA’s Hourly Climate Data FTP archive into SAP HANA and SAP Lumira in Big Data Geek – Finding and Loading NOAA Hourly Climate Data – Part 1. Since then, a few days have passed and the rest of the data got downloaded.

Here are the facts!

– 500,000 uncompressed sensor files and 500GB

– 335GB of CSV files, once processed

– 2.5bn sensor readings since 1901

– 82GB of Hana Data

– 31,000 sensor locations in 288 countries

Wow. Well Tammy Powlas asked me about Global Warming, and so I used SAP Lumira to find out whether temperatures have been increasing in Virginia, where she lives, since 1901. You will see in this video, just how fast SAP HANA is to ask complex questions. Here are a few facts about the data model:

– We aggregate all information on the fly. There are no caches, indexes, aggregates and there is no cheating. The video you see is all live data [edit: yes, all 2.5bn sensor readings are loaded!].

– I haven’t done any data cleansing. You can see this early on because we have to do a bit of cleansing in Lumira. This is real-world, dirty data.

– HANA has a very clever time hierarchy which means we can easily turn timestamps into aggregated dates like Year, Month, Hour.

– SAP Lumira has clever geographic enrichments which means we can load Country and Region hierarchies from SAP HANA really easily and quickly.

I was going to do this as a set of screenshots, but David Hull told me that it was much more powerful as a video, because you can see just how blazingly fast SAP HANA is with Lumira. I hope you enjoy it!

Let me know in the comments what you would like to see in Part 3.

Update: between the various tables, I have pretty good latitude and longitude data for the NOAA weather stations. However, NOAA did a really bad job of enriching this data and it has Country (FIPS) and US States only. There are 31k total stations, and I’d love to enrich these with global Country/Region/City information. Does anyone know of an efficient and free way of doing this? Please comment below! Thanks!

Update: in a conversation with Oliver Rogers, we discussed using HANA XS to enrich latitude and longitude data with Country/Region/City from the Google Reverse Geocoding API. This has a limit of 15k requests a day so we would have to throttle XS whilst it updates the most popular geocodings directly. This could be neat and reusable code for any HANA scenario!

=============================================================

Here is the fun part. Data and software is available for free.

The global ISH data converted into a usable format (from part 1 above) is here

The Lumira visualization tool is available free here

Happy data sleuthing, folks.

0 0 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

53 Comments
Inline Feedbacks
View all comments
November 9, 2013 10:01 am

But I see that self-proclaimed Nobel laureate Michael Mann is hailing Virginia’s gubernatorial race as a referendum on climate science.

================================================================
And if his guy had lost then it would have been due to Koch Bro’s “Big Oil” money.
A “referendum on climate science”? I thought only those with pal-rev….er…peer-reviewed papers on the subject were qualified to have an valid opinion. Do all the dem voters in Va. have such papers?

November 9, 2013 10:03 am

Visualization of data might be fun, but actually experiencing it may not be so, as I just returned to London from the warmer Mediterranean. Nevertheless, the early indications are that this English winter may be a bit less cold than the last, and more in line with the 20 year average
http://www.vukcevic.talktalk.net/CET-dMm.htm

Blokj
November 9, 2013 10:08 am

SAP Hana is a new type of database that uses columstore as technique to store data. Next to that, the database will store all data in the memory of the pc/server. This means that for this dataset, a machine would need at least 82GB of ram to store the data.
Positive speed effect of this is the parallelism with the cpu cores that can be used for searching in the columns (column is cut in pieces and each cpu handles it’s assigned part) and the fact that all data is in memory.

Bloke down the pub
November 9, 2013 10:11 am

Any updates on the USCRN ?

November 9, 2013 10:17 am

Ha! As an employee of SAP, this brings a smile to my face. HANA is transforming many industries around the globe through ultra-fast, real-time data and transaction processing and analytics. Great to see it used here!

Pippen Kool
November 9, 2013 10:48 am

This is a useful post from wuwt a few days ago:
http://temperaturetrends.org/state.php?state=VA
REPLY: Not particularly useful, unless of course you are a phantom who wants to push an agenda.
The difference is that the NOAA ISH data is not adjusted, and has less need for it to be due to it being hourly data. The data you prefer is statistically derived from daily Tmax/min avg and is highly adjusted. The TOBs adjustment alone is a big factor.

Note the total adjustments for both major dataset increased recently :

Of course I’m sure you prefer adjusted data, since it fits your expectations – Anthony

November 9, 2013 11:02 am

Mann is so full of it. I live in Virginia Beach. There was not one word mentioned about global warming during the campaign that I heard, and I paid pretty close attention.

November 9, 2013 11:22 am

Pippen Kool,
What say you? Anthony has provided real world measurements vs adjusted measurements — which are not real measurements at all.
What do you prefer? Empirical facts? Or “adjusted” “facts”?

Nick Stokes
November 9, 2013 11:46 am

“It has good station coverage in the last 40 years…”
But your plot goes for 75 years. I don’t see how you have handled the variable inclusion of stations. It’s a very big problem with absolute temperatures – that is why anomalies are used. Without anomalies, when a mountain station, say, enters the dataset, it really pulls down the average. You end up with a time sequence that reflects the varying composition of the station set rather than the climate sequence.

SasjaL
November 9, 2013 11:49 am

Dr. Mann is a rare bird …
Metaphorically speaking, within climate science he is a very young Cuculus Canorus …

Jquip
November 9, 2013 11:55 am

dbstealey: “What do you prefer? Empirical facts? Or “adjusted” “facts”?”
Well, it’s either about an Instrument or about an Instrumagic. Assuming that Poppen Kollar, and all scientists, prefer instruments then they prefer empirical facts. But if we know that the adjustment to the instrument is empirically based, then we know there is a provably better instrument that we are comparing it to. And as the adjustments continue to increase as time goes forward, then we are provably, and steadily, losing the ability to produce the provably better instruments from the late 1800s.
The shocking conclusion is that we have lost, or are steadily losing, any ability to produce useful scientific instruments. And so I’m sure that you, me, and Poppen Kollar all agree that the government should institute policy to produce the same superior thermometers that existed on about the time of the Civil War.

Greg
November 9, 2013 11:56 am

The two graphs in Anthony’s reply here are very informative:
http://wattsupwiththat.com/2013/11/09/virginia-is-for-warmers-data-says-no/#comment-1470319
Just what is going on? HadCrut for land areas is basically CRUTem(4?) , the “Had” being SST.
So it seems the land data gets a step jump up at the same time as the SST data gets the post war drop.

November 9, 2013 12:02 pm

I’m a bit surprised that there is no discernable trend or more cyclic character to the data. Just shows perceptions and memory are not the best climate trending tools. I was not surprised that there was no real warming in the past decade. I’ve always been suspicious of the adjusted data showing lot’s of warming.

November 9, 2013 12:06 pm

Jquip,
I would agree — if the original, unadjusted data was included with the adjusted ‘data’. But most of the time it is not.
Also, there is nothing wrong with using the original thermometer data, if the same instrument is used for all the data. For example, when the CET is used from about the 1700’s until now, the trend appears. That trend is ≈0.35º/century, and it has not accelerated. Thus, the recent rise in CO2 has not had the predicted effect. That fact pretty much debunks the conjecture that a rise in CO2 will lead to runaway global warming.
Real world facts trump adjustments. But mainstream climateers insist on using ‘adjusted’ numbers. The reason is clear: if they showed the actual data, their grant money would start to dry up.

rogerknights
November 9, 2013 12:49 pm

Note that this is hourly data, mainly from airport sources. It doesn’t have the same sort of adjustments applied to it that daily data Tmax/min and Tmean have in the favorite surface climate data indices used by HadCRUT and GISS. For example, since it is hourly data, there is no need for a TOBs (Time of Observation) correction. Readers may recall that Dr. Roy Spencer has used ISH data for some of his analyses in the past.

Say . . . here’s an idea for a paper. Find locations where hourly temperatures have been collected and compare their temperature record to that from the same (or almost overlapping) locations where TOBS adjustments have been made. From this (and also perhaps from comparisons to locations in the high quality network) perhaps one could back out the TOBS biasing. Perhaps this could help whittle down the error bars in the Anthony Watts et al. 2012 paper.

Jquip
November 9, 2013 12:59 pm

dbstealey: “But mainstream climateers insist on using ‘adjusted’ numbers.”
Exactly right on all counts. But no reason not to give them options. It’s either first class fakery of Instrumagic, or modern man cannot construct the sort of superior instruments produced by slave-owners from over a century ago.
Properly, of course, if the instruments we’re using are the instruments we have — then that’s what all scientific measurement and prediction is required to use. Adjusting historical values for presentation purposes is a judgement call at that point. But in no case is present Instrumagic of present instruments justified. There used to be an ‘effort’ argument in the quill-pen days about rescribbling old data for new instruments, but that was never valid or sound, and is wholly unjustified in the modern digital age.

Bill Illis
November 9, 2013 1:12 pm

This is great stuff John, but I don’t think any of us are going to be able to pull off what you did.
Keep it up and come back with all the data; US and Global etc.
You might be the first person who has been able to do so.

Nick Stokes
November 9, 2013 1:26 pm

Greg says: November 9, 2013 at 11:56 am
“The two graphs in Anthony’s reply here are very informative:”

Well, read with care. The first actual graph, from NOAA, correctly describes the adjustments in the earlier version of USHCN. But the commentary underneath treats the °F numbers as if they were °C. This error was pointed out when the fig first appeared in 2007, and has been pointed out when used since (see here, here). But it just continues.
The TOBS adjustment in USHCN is due to a particular, documented circumstance; COOP observer times were advised by NWS, but not mandated. They could and did ask to change them and many did over the years – mostly from the originally advised evening to morning, when rain gauges were read. There is ample diurnal information available to calculate the effect of the change.

November 9, 2013 2:06 pm

http://wattsupwiththat.com/2013/11/09/virginia-is-for-warmers-data-says-no/#comment-1470319

Of course I’m sure you prefer adjusted data, since it fits your expectations – Anthony

=====================================================================
I’m not sure what alphabet soup this falls under (I am a layman.) but, Pippin, try this.
http://wattsupwiththat.com/2013/11/06/politicized-congressional-temperature-trends/#comment-1467925
http://wattsupwiththat.com/2013/11/06/politicized-congressional-temperature-trends/#comment-1468600
PS I looked at some of the “adjustments” made with Time of Observation Bias in mind. It seems to this layman that if a record high of, say, 91* F, on a particular day was adjusted due to TOB then the record high for the day before or the day after shouldn’t be less than 91*F. The same holds true for the record lows I looked at.
Granted, I only looked at a few of the dates. But I looked at enough to discount TOB as a valid explanation for the changes made to past records.
Tell us what you find for your little spot on the globe.

Bill Illis
November 9, 2013 2:27 pm

Nick Stokes’ comments assume that weather observers did not know that you get different temperatures at different times of the day. I mean really, at what age do humans figure this out, 3 years old?
The first mention of correcting for it was 1854 and then the head of the Weather Bureau published the first time of observation corrections for the US covering 1873 to 1905.
http://catalog.hathitrust.org/Search/Home?lookfor=%22Bigelow, Frank H. 1851-1924.%22&type=author&inst=
But the NCDC continues to adjust for this on a continuing basis, every month, even today, correcting 2003 temperatures even.

Alvin
November 9, 2013 2:32 pm

You can get a Dell server that supports 128 GB RAM and 8 cores for under 6k now. There are no excuses why you can’t perform in-house modeling.

Bill Illis
November 9, 2013 2:35 pm
Jquip
November 9, 2013 2:41 pm

Bill Illis: “But the NCDC continues to adjust for this on a continuing basis, every month, even today, correcting 2003 temperatures even.”
Right, this is another of the quill-pen Instrumagic issues. As we’d been well aware even by the Civil War that calculus is a pretty useful way to get the area under a curve. And that the midpoint of two arbitrary points on a curve is not; no matter how well you argue in favor of which arbitrary points to use. It’s never been valid nor sound, and again the justification of ‘cheap,’ ‘lazy,’ or ‘effort’ don’t even rise to the level of being humored in the digital age.

November 9, 2013 2:41 pm

Bill Illis says: November 9, 2013 at 2:27 pm
“Nick Stokes’ comments assume that weather observers did not know that you get different temperatures at different times of the day. I mean really, at what age do humans figure this out, 3 years old?”

No, it assumes they faithfully recorded the observations of the min-max thermometers at the stated times. I don’t expect that they made their own adjustments on the fly.
Gunga Din says: November 9, 2013 at 2:06 pm
“It seems to this layman that if a record high of, say, 91* F, on a particular day was adjusted due to TOB then the record high for the day before or the day after shouldn’t be less than 91*F.”

Adjustments are made for the purpose of removing bias when computing a long-term average. Despite common belief here, it is not an alteration to the historic record. It would be inappropriate to derive a record high from adjusted data, and I very much doubt that it has been done.

November 9, 2013 2:48 pm

Nick Stokes says:
November 9, 2013 at 2:41 pm
Adjustments are made for the purpose of removing bias when computing a long-term average.
============================================================
Do you see the irony in that the adjustments create a bias in the long-term average (always up)

1 2 3