A new tool for climate data visualization is showcased
David Walton writes to tell us about a satirical piece by Mark Steyn – titled Virginia is for Warmers
I was blessedly out of the country on Tuesday, so I’m belatedly catching up on post-election analysis. But I see that self-proclaimed Nobel laureate Michael Mann is hailing Virginia’s gubernatorial race as a referendum on climate science. To scroll his Twitter feed, you’d get the impression “climate change” was the No. 1 electoral issue in the state. Which would tend to suggest that the subject is political speech. Which is protected by the First Amendment, isn’t it?
Dr. Mann is a rare bird: a political activist whose politics require insulation from the Constitution.
About the same time WUWT reader “James” tips us to this story about a data analyst that decided to run some software called SAP HANA and a visualization tool called Lumira on climate data in Virgina in response to a reader challenge on their forum. Both of these tools are for the business community where getting it right has big rewards and getting it wrong has financial consequences.
He’s using data from NOAA’s Integrated Surface Database which has data back to 1901. It has good station coverage in the last 40 years, so should be able to detect a global warming signal during the period. Apparently, there is none in this data, which is absolute temperature, not anomaly data. Below is a screencap of Virginia time series of average temperature from the Lumira program:
You can read all about the ISH data in this PDF document here. Note that this is hourly data, mainly from airport sources. It doesn’t have the same sort of adjustments applied to it that daily data Tmax/min and Tmean have in the favorite surface climate data indices used by HadCRUT and GISS. For example, since it is hourly data, there is no need for a TOBs (Time of Observation) correction. Readers may recall that Dr. Roy Spencer has used ISH data for some of his analyses in the past.
Part 1 where he converts/ingests data is linked in part2 of the story below, and a video of the entire analysis process follows:
By John Appleby
So I discussed loading the data from NOAA’s Hourly Climate Data FTP archive into SAP HANA and SAP Lumira in Big Data Geek – Finding and Loading NOAA Hourly Climate Data – Part 1. Since then, a few days have passed and the rest of the data got downloaded.
Here are the facts!
- 500,000 uncompressed sensor files and 500GB
- 335GB of CSV files, once processed
- 2.5bn sensor readings since 1901
- 82GB of Hana Data
- 31,000 sensor locations in 288 countries
Wow. Well Tammy Powlas asked me about Global Warming, and so I used SAP Lumira to find out whether temperatures have been increasing in Virginia, where she lives, since 1901. You will see in this video, just how fast SAP HANA is to ask complex questions. Here are a few facts about the data model:
- We aggregate all information on the fly. There are no caches, indexes, aggregates and there is no cheating. The video you see is all live data [edit: yes, all 2.5bn sensor readings are loaded!].
- I haven’t done any data cleansing. You can see this early on because we have to do a bit of cleansing in Lumira. This is real-world, dirty data.
- HANA has a very clever time hierarchy which means we can easily turn timestamps into aggregated dates like Year, Month, Hour.
- SAP Lumira has clever geographic enrichments which means we can load Country and Region hierarchies from SAP HANA really easily and quickly.
I was going to do this as a set of screenshots, but David Hull told me that it was much more powerful as a video, because you can see just how blazingly fast SAP HANA is with Lumira. I hope you enjoy it!
Let me know in the comments what you would like to see in Part 3.
Update: between the various tables, I have pretty good latitude and longitude data for the NOAA weather stations. However, NOAA did a really bad job of enriching this data and it has Country (FIPS) and US States only. There are 31k total stations, and I’d love to enrich these with global Country/Region/City information. Does anyone know of an efficient and free way of doing this? Please comment below! Thanks!
Update: in a conversation with Oliver Rogers, we discussed using HANA XS to enrich latitude and longitude data with Country/Region/City from the Google Reverse Geocoding API. This has a limit of 15k requests a day so we would have to throttle XS whilst it updates the most popular geocodings directly. This could be neat and reusable code for any HANA scenario!
Here is the fun part. Data and software is available for free.
The global ISH data converted into a usable format (from part 1 above) is here
The Lumira visualization tool is available free here
Happy data sleuthing, folks.