Crowdsourcing An Opensource Temperature Data Monitoring Methodology and Spreadsheet

Walter Dnes – Click the pic to view at source

Image Credit: Walter Dnes

By Walter Dnes – Edited by WUWT Regular Just The Facts

I have developed a methodology and spreadsheet to capture and chart HadCRUT3, HadCRUT4, GISS, UAH, RSS, and NOAA monthly global temperature anomaly data. Calculations are also done to determine the slope of the anomaly data from any given month to the most recent month of data. Your help is needed to help validate the methodology I’ve created and help us to leverage the expertise and resources of WUWT keep this data reasonably up to date.

In order to dispense any potential legal/copyright questions:

1) I, Walter Dnes, hereby declare that the end-user programming in the spreadsheet and this article are entirely my work product.

2) I, Walter Dnes, grant you the royalty-free, perpetual, irrevocable, non-exclusive, transferable license to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, perform, and display the aforementioned end-user programming and spreadsheet (in whole or part) worldwide and/or to incorporate it in other works in any form, media, or technology now known or later developed.

The spreadsheet is on Google Docs at this URL:

https://docs.google.com/spreadsheet/ccc?key=0AnTohu4oFUbcdEgzTkpEYTAwN1BiXzJXMXZ5RVJiOUE&usp=sharing

Some of the graphs can take several seconds to render, because of the complexity and sheer amount of data. I acknowledge that implementing a spreadsheet via web interface is an amazing feat. I do not wish to detract from that, or complain about it. However, there are some limitations that require workarounds. I will note them as necessary.

I’ve sized all graphs to a 1920×1080 screen. My apologies to users with smaller screens. The details of the graph would be difficult to see if the graph was reduced.

The monthly date convention used in this spreadsheet is to refer to monthly data by the end of the month in question. So January 2008 would be 2008.083 (year 2008, offset by 1/12th of a year, i.e. 1 month). This keeps going through November 2008 (2008.917) and December 2008 (2009.000). This may seem a bit weird, but in computing, we often start at zero rather than 1. It works out better in many cases. Here is a sample set of dates to familiarize you with the idea…

2007/12 == 2008.000 where .000 = Dec data for previous year (2007)

2008/01 == 2008.083 where .083 = Jan data for current year (2008)

2008/02 == 2008.167 where .167 = Feb data for current year (2008)

2008/03 == 2008.250 where .250 = Mar data for current year (2008)

2008/04 == 2008.333 where .333 = Apr data for current year (2008)

2008/05 == 2008.417 where .417 = May data for current year (2008)

2008/06 == 2008.500 where .500 = Jun data for current year (2008)

2008/07 == 2008.583 where .583 = Jul data for current year (2008)

2008/08 == 2008.667 where .667 = Aug data for current year (2008)

2008/09 == 2008.750 where .750 = Sep data for current year (2008)

2008/10 == 2008.833 where .833 = Oct data for current year (2008)

2008/11 == 2008.917 where .917 = Nov data for current year (2008)

2008/12 == 2009.000 where .000 = Dec data for previous year (2008)

And now for an overview of the spreadsheet…

Tab “temp_data”:

Anomaly data

Column A is date in decimal years in the manner noted above.

Column B is HadCRUT3 anomaly data

Column C is HadCRUT4 anomaly data

Column D is GISS anomaly data

Column E is UAH anomaly data

Column F is RSS anomaly data

Column G is NOAA anomaly data

____________________________________________________________________

Slope data

Column I has the slope for each corresponding cell in column B (HadCRUT3) from that cell’s date (Column A) to the most recent month with data for that dataset.

Column J slope data for Column C (HadCRUT4)

Column K slope data for Column D (GISS)

Column L slope data for Column E (UAH)

Column M slope data for Column F (RSS)

Column N slope data for Column G (NOAA)

For columns I through N, the earliest cell with a negative value indicates how far back one can go in a temperature series, with a negative slope. The slope data is plotted in the tabs with the names of the datasets. This allows one to see where the slope value crosses zero.

____________________________________________________________________

12 month running means

Column P is HadCRUT3 12-month running mean anomaly.

Column Q is HadCRUT4 12-month running mean anomaly.

Column R is GISS 12-month running mean anomaly.

Column S is UAH 12-month running mean anomaly.

Column T is RSS 12-month running mean anomaly.

Column U is NOAA 12-month running mean anomaly.

Column V is left blank for data-import when updating data.

A couple of notes about limitations of Google’s online spreadsheet

1) You can not enter text manually in graph legends. The spreadsheet can, however, use text from the first row of the series, i.e. the “header row”. Cells P11 through U11 have the series’ names in them, for use in the legend.

2) Scatter graphs will not work properly with nulls/blanks in a series. In order to get the series of varying length to plot properly, dummy values have to be inserted to fill in shorter series. I use -9 as the filler value.

Tab “HadCRUT3”:

Is a graph of slope values for each month for the HadCRUT3 series. The slope is from the month of the cell (given in Column A) to the most recent month of data. It uses data from Column I. Note that due to complexity limits in the Google spreadsheet, the values are only calculated for part of the data series.

Tab “HadCRUT4”:

Is a graph of slope values for each month for the HadCRUT4 series, using data from Column J.

Tab “GISS”:

Is a graph of slope values for each month for the GISS series, using data from Column K.

Tab “UAH”:

Is a graph of slope values for each month for the UAH series, using data from Column L.

Tab “RSS”:

Is a graph of slope values for each month for the RSS series, using data from Column M.

Tab “NOAA”:

Is a graph of slope values for each month for the NOAA series, using data from Column N.

Tab 12mo1850:

Is a graph of 12-month running means of anomalies from January 1850 to present.

Tab 12mo1979:

Is a graph of 12-month running means of anomalies from 1979 to present. This covers the satellite data era.

Navigating Through The Spreadsheet:

Spreadsheet navigation is similar to Excel, with the most major difference being that pressing the {END} key immediately takes you to the far right-hand side of the page. Similarly, pressing the {HOME} key immediately takes you to the far left-hand side of the page. The equivalant to {END}{UP}, {END}{DOWN}, {END}{LEFTARROW}, and {END}{RIGHTARROW} combinations is to hold down the {CTRL} key while pressing the arrow in the direction you wish to jump.

Interpreting The Slope Graphs:

The slope graphs in the tabs “HadCRUT3”, “HadCRUT4”, “GISS”, “UAH”, “RSS”, and “NOAA” represent the slope from a given month to the latest available data. Note that the graphs are a guide to narrow down the earliest month with a negative slope. The authoritative numbers are in columns I through N of tab “temp_data”, which list the slopes. Two examples follow. ***IMPORTANT*** as additional months of data come in, the slope numbers and graphs will change each month. The numbers and graphs used in these examples were generated in mid-May 2013, using data to the end of April 2013. Do not expect to see the same numbers that you see in the screenshots. To find the longest period of negative slope, find the leftmost (i.e. earliest) point in graphs HadCRUT3/HadCRUT4/GISS/UAH/RSS/NOAA which has a value below zero.

The easy example is shown by image rssmag.png, which is a zoom of part of the graph in tab “RSS”. It’s obvious that 1997.0 (i.e. December 1996 is negative). The screenshot rssdata1.png confirms that 1997.0 has a negative value in column M, which contains RSS slopes.

UAH is a more difficult case. Data including April 2013 shows that the first negative slope value is sometime in 2008. See the zoomed image uahmag.png. The graph at least narrows down the month to somewhere in 2008. Image uahdata1.png shows that the first negative value in column L (UAH) is for 2008.583, i.e. July 2008.

“Coming Soon” Part 2: Instructions for Updating the Global Temperature Records in Google Docs:

This spreadsheet is intended as a proof-of-concept and a starting point for people who may want to extend it further. A followup post will deal with updating your own copy of this spreadsheet on Google Docs, or downloading and maintaing a local copy on your home machine.

Please let us know if you see any issues or errors within the methodology or spreadsheet. Also, please let us know your thoughts and recommendations on how we can best keep this spreadsheet reasonably up to date. Ideally we would like to automate this process or spread the work among a few WUWT’s readers. We also need to figure out how to make the resultant data readily available on WUWT, which could be accomplished through monthly/quarterly WUWT threads, a WUWT Reference Page or other communication method. Please let us know your thoughts below.

0 0 votes

Article Rating

49 Comments

crosspatch

May 23, 2013 3:42 pm

No CRN data?

Walter Dnes

Editor

May 23, 2013 3:52 pm

> crosspatch says: May 23, 2013 at 3:42 pm
> No CRN data?
I’m concentrating on global data sets in this spreadsheet. In a followup post, I’ll explain how to download the spreadsheet to your PC, or copy to your Google account. Once you have your own copy, you can customize it to your heart’s content. Note that the Google online spreadsheet is already rather slow. So I hesitate to add more data sets to this spreadsheets.

Zeke Hausfather

May 23, 2013 3:53 pm

You could also just use this…
http://www.woodfortrees.org/

walterdnes

Editor

May 23, 2013 4:08 pm

> Zeke Hausfather says: May 23, 2013 at 3:53 pm
> You could also just use this… http://www.woodfortrees.org/
It’s already listed on the WUWT sidebar. The spreadsheet in this article allows people to “roll their own”, add additional data sets, go into greater detail, and whatever other customizations they want. This goes beyond what you can do on WFT.

Werner Brozek

May 23, 2013 4:09 pm

Zeke Hausfather says:
May 23, 2013 at 3:53 pm
You could also just use this…
I really like WFT, however GISS and HadCRUT3 have not been updated since November so I had to rely on Walter and other sources for the latest times where the slope was 0.

Lance Wallace

May 23, 2013 4:20 pm

Which 30-year period are you using for a base?

walterdnes

Editor

May 23, 2013 4:28 pm

> Lance Wallace says: May 23, 2013 at 4:20 pm
> Which 30-year period are you using for a base?
I download the data from the respective websites, so it depends on what base period they use. For purposes of calculating slope, it doesn’t matter. We’re looking at the relative change over a period of time.

Nick Stokes

May 23, 2013 5:12 pm

It seems to me that Javascript with HTML 5 is more flexible, and also quicker to download. You can also do more interactive graphics (example here). Or here is an example with graph mobility and ability to add regression curves (and add your own data, and output numerics).
On updating, I’ve been using wget and cURL to automatically download data files each month – I run a script every night. They allow you to mirror – ie just download when there are updates. I keep the updated data here. It also automatically updates the graphs.

May 23, 2013 5:16 pm

I read only half-way down but it seems as if you’re trying to reproduce something that could be done quite easily using JavaScript or – I suspect – WordPress’ own plotting widget.
You could even have a processing tool for regression fits (linear, pwr law, polynomial etc.), convolution/FFT (+processing), even moving FFT or FWT, etc.
But the short answer is that you need time and people to do it. Even I could write the code in C (library of a few 1000 lines of code would do all the maths) to do all above but you’re need some type of php to interface although I doubt WordPress would like this very much (security risk that exposes too much native code). Thing is you need to sit down with someone at WUWT and work through aims, needs => design (maintainability) => output.
I’ll ask around and get a few experts to read the post and see what they suggest.

walterdnes

Editor

May 23, 2013 5:29 pm

> Nick Stokes says: May 23, 2013 at 5:12 pm
> On updating, I’ve been using wget and cURL to automatically download
> data files each month – I run a script every night. They allow you to
> mirror – ie just download when there are updates. I keep the updated
> data here. It also automatically updates the graphs.
One advantage of doing it manually is that you find out right away when the version number is bumped, and the source URL changes. E.g. HadCRUT4 is now at http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.monthly_ns_avg.txt
http colon slash slash http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.monthly_ns_avg.txt
The whole point of this excercise is to allow a competent spreadsheet user to do it all on their home PC. A followup post in the next few days will get into the details of doing monthly updates.

Nick Stokes

May 23, 2013 5:46 pm

Walter,
“HadCRUT4 is now at…”
Yes, that was a nuisance. But there’s a way. On this page, which doesn’t change its URL, just locate the link in the HTML.

walterdnes

Editor

May 23, 2013 5:47 pm

> cd says: May 23, 2013 at 5:16 pm
> But the short answer is that you need time and people to do it. Even
> I could write the code in C (library of a few 1000 lines of code would
> do all the maths) to do all above but you’re need some type of php
> to interface although I doubt WordPress would like this very much
> (security risk that exposes too much native code). Thing is you need
> to sit down with someone at WUWT and work through aims,
> needs => design (maintainability) => output.
That’s exactly what I’m trying to avoid. A followup post will discuss downloading the spreadsheet and fixing up the graphs (because the translation of graphs during the download has problems). I want something that a competent spreadsheet user can update and customize on their own.

Snowlover123

May 23, 2013 5:49 pm

Is it possible to graph the JMA temperatures with the data above?
http://wattsupwiththat.com/2013/01/31/japans-cool-hand-luke-moment-for-surface-temperature/
http://wattsupwiththat.files.wordpress.com/2013/01/surfacetemps_japan.png?w=640&h=507

Greg Goodman

May 23, 2013 5:53 pm

“This goes beyond what you can do on WFT.”
Indeed, WTF is too limited.
Echoing some comment above , spend time defining the objectives before doing too much work.
On what you have so far.
If you are plotting slopes you are interested in rate of change, why not plot rate of change directly (ie the monthly incremental changes, = “first difference”)
It’s easier to visualise what you are actually interested in. Not sure why the cumulative slope. This has changing changing sensitivity and frequency response as it progresses. Not sure this would be informative.
Argh, don’t use running averages. Here’s why not and a better option:
http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/
December2008=2008 is crazy confusing and pointless. I would suggest logging data that is a monthly average at the middle of the month : Jan = 1/24 ; Feb =3/24 etc.
That way you are not introducing a phase shift into the data and not calling 2007 data 2008.
Hope that helps.

walterdnes

Editor

May 23, 2013 6:21 pm

> Snowlover123 says: May 23, 2013 at 5:49 pm
> Is it possible to graph the JMA temperatures with the data above?
Any monthly data. Wait for my next post about downloading the spreadsheet and updating/customizing it.

justthefactswuwt

Author

May 23, 2013 6:34 pm

Lance Wallace says: May 23, 2013 at 4:20 pm
Which 30-year period are you using for a base?
Good question. We should be using 1981–2010 as our base period;

in order to comply with a recommended World Meteorological Organization (WMO) Policy, which suggests using the latest decade for the 30-year average.
http://www.ncdc.noaa.gov/cmb-faq/anomalies.php

However, there is significant variation across the data sets, i.e.:
UAH

The global, hemispheric, and tropical LT anomalies from the 30-year (1981-2010) average
http://www.drroyspencer.com/2013/04/uah-global-temperature-update-for-march-2013-0-18-deg-c-again/

GISS

Anomalies are relative to the 1951-80 base period means
http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt

HadCRUT4

Time series are presented as temperature anomalies (deg C) relative to 1961-1990

http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/download.html
RSS

Anomalies are computed by subtracting the mean monthly value (averaged from 1979 through 1998 for each channel) from the average brightness temperature for each month.
http://www.ssmi.com/msu/msu_data_description.html#rss_msu_data_analysis
Beginning in December 2010, all lower troposphere, middle troposphere, and lower stratosphere satellite data are reported here with respect to the 1981–2010 base period. Prior to December 2010, data were reported with respect to the 1979–1998 base period. Remote Sensing Systems continues to provide data to NCDC with respect to the 1979–1998 base period; however, NCDC readjusts the data to the 1981–2010 base period so that the satellite measurements are comparable. http://www.ncdc.noaa.gov/temp-and-precip/msu/

NOAA NCDC

“The global and hemispheric anomalies are provided with respect to the period 1901-2000, the 20th century average.”
“Why do some of the products use different reference periods?
The maps show temperature anomalies relative to the 1981–2010 base period. This period is used in order to comply with a recommended World Meteorological Organization (WMO) Policy, which suggests using the latest decade for the 30-year average. For the global-scale averages (global land and ocean, land-only, ocean-only, and hemispheric time series), the reference period is adjusted to the 20th Century average for conceptual simplicity (the period is more familiar to more people, and establishes a longer-term average). The adjustment does not change the shape of the time series or affect the trends within it.
http://www.ncdc.noaa.gov/cmb-faq/anomalies.php

NOAA CPC

NOAA’s Climate Prediction Center has already changed their Normals to the 1981 – 2010 base period? Why are those Normals not available?
Many organizations, including NOAA’s Climate Prediction Center (CPC), develop their own averages and change base periods for internal use. However, NCDC’s climate Normals are the official United States Normals as recognized by the World Meteorological Organization and the main Normals made available for a variety of variables. Below is a brief summary of changes to http://www.ncdc.noaa.gov/cmb-faq/anomalies.phpthe CPC products due to the change in climate base period from 1971 – 2000 to 1981 – 2010:

NOAA Climate Prediction Center’s CAMS station temperature anomaly dataset. “CAMS” is an acronym for the “Climate Anomaly Monitoring System” in use at the Climate Prediction Center (CPC).

CAMS station surface air temperature anomalies for the globe with respect to the 1971-2000 climatological base period.
http://iridl.ldeo.columbia.edu/maproom/Global/Atm_Temp/Monthly_stn_anom.html

Please post base periods and source links to any data sets I missed.

JMI

May 23, 2013 6:37 pm

You guys certainly have a good handle on things crowd-wise!
Though, I’ve also come across some new research that you may find interesting in this regard…
It’s called “The Theory of Crowd Capital” and you can download it here if you’re interested: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2193115
Enjoy!

walterdnes

Editor

May 23, 2013 6:40 pm

> Greg Goodman says: May 23, 2013 at 5:53 pm
> If you are plotting slopes you are interested in rate of change,
> why not plot rate of change directly (ie the monthly incremental
> changes, = “first difference”)
> It’s easier to visualise what you are actually interested in. Not
> sure why the cumulative slope. This has changing changing
> sensitivity and frequency response as it progresses. Not sure
> this would be informative.
Like it or not, one of the major arguments right now is “no global warming for X years”. The CAGW crowd has drawn the line at 17 years as proof that global warming has stopped. That’s why people are fixated on how far back we can go with a negative slope.
> Argh, don’t use running averages. Here’s why not and a better option:
> http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/
Again, we’re debating the CAGW crowd. The “annual temperature anomaly” is what the public is fixated on. This is the 12 month running mean from January through December. My plot includes all 12 points in the year. Can you persuade GISS etal to use the result of a triple running mean filter, rather than a straight 12 month average?

William Astley

May 23, 2013 6:40 pm

In reply to:
Zeke Hausfather says:
May 23, 2013 at 3:53 pm
You could also just use this…
http://www.woodfortrees.org/
Woods for trees has for some unexplained reason dropped UAH global temperature anomaly.
It is interesting that the satellite UAH global temperature anomaly is less than HADCRUT4 and GISS in the 1980s, while HADCRUT4 and GISS are almost 0.4 higher than UAH in 2013.
It appears the science is not settled in the measurement and the manipulation of planetary temperature.
https://docs.google.com/spreadsheet/ccc?key=0AnTohu4oFUbcdEgzTkpEYTAwN1BiXzJXMXZ5RVJiOUE#gid=15

justthefactswuwt

Author

May 23, 2013 6:42 pm

Nick Stokes says: May 23, 2013 at 5:12 pm
I keep the updated data here. It also automatically updates the graphs.
Nick, why do you chose to use the base period of 1979 – 2000 for the plots on your site, i.e.:

Temperatures anomalies in the tables are as stated by the providers, with different anomaly bases. They have been converted to the same base (1979-2000) for plotting.
http://moyhu.blogspot.com.au/p/latest-ice-and-temperature-data.html

Nick Stokes

May 23, 2013 6:52 pm

JTF,
I start in 1979 because that is when satellites start. I actually started collecting the data in 2010, so couldn’t use that decade, though I maybe should have gone to 2009.
But it’s just a matter of subtracting off an average – a single number. It doesn’t matter very much which period, as long as it’s the same for each.. My main concern was to have them all on the same base.

May 23, 2013 7:08 pm

Your numbering convention (2000.000 = Dec 1999) is different than most similar implementations. Not wrong, but not standard and counterintuitive.
Your slope appears to be a two point calculation. Much less useful than the standard method of calcing slope of temp series, which would be the slope of the linear regression of all of the data between the endpoints…

walterdnes

Editor

May 23, 2013 7:20 pm

> JJ says: May 23, 2013 at 7:08 pm
> Your slope appears to be a two point calculation. Much less useful than
> the standard method of calcing slope of temp series, which would be the
> slope of the linear regression of all of the data between the endpoints…
??? The slope uses the spreadsheet “slope” function, which uses all the data points. E.g. cell M1764 is
=slope(F1764:F$1960,$A1764:$A$1960)

justthefactswuwt

Author

May 23, 2013 7:57 pm

JMI says: May 23, 2013 at 6:37 pm
You guys certainly have a good handle on things crowd-wise!
Though, I’ve also come across some new research that you may find interesting in this regard…
It’s called “The Theory of Crowd Capital” and you can download it here if you’re interested: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2193115
Enjoy!
Interesting. They definitely seem to understand the process and value, i.e.:

“Crowdsourcing, for example, is well-known as a distributed problem-solving and production model, where problems are broadcast through web-based IS to an unknown group of solvers, in the form of an open call for solutions [7]. In the case of Crowdsourcing, the structuring of heterogeneous knowledge resources is undertaken by dispersed individuals, mediated through web-based IS, “where a huge amount of individual contributions build solid and structured sources of data” [46].”
“We espouse the potential of collective intelligence – some knowledge that is more accurate when it consists of inputs from a distributed population [33]. However, we further reason that gatekeepers, the monitoring and coordination of the flow and assimilation of data, information, and knowledge, into organizations, through the Crowd Capability process, are a must for Crowd Capital to accrue. Therefore, we have strived to amalgamate the dispersed discourses on these important phenomena among scholars, and outlined a model for inculcating Crowd Capital in any organization.”

In the current context, their theory is very organizationally focused, i.e.;

We began our investigation in this work, with the observation that more and more organizations are undertaking activities to engage dispersed populations through IS. In this paper, we present theory that explains this observed phenomenon, as an attempt by organizations to engage the dispersed knowledge of individuals. Furthermore, we theorize that organizations can create a heterogeneous capability stemming from their extant resources, called Crowd Capability that will engage dispersed knowledge through particular structure, content, and processes. And that doing so, will generate a heterogeneous knowledge resource for the organization known as Crowd Capital.

whereas WUWT is much less structured, more of an organism with multitude of heads and hands.

TRBixler

May 23, 2013 8:07 pm

So we accurately “know” the temperature to .3000 between datasets. While the trends diverge.

1 2 Next »

wpDiscuz

Share this:

Related Posts

UAH v6.1 Global Temperature Update for May, 2026: +0.53 deg. C

UAH v6.1 Global Temperature Update for April, 2026: +0.39 deg. C

UAH v6.1 Global Temperature Update for March, 2026: +0.38 deg. C

UAH v6.1 Global Temperature Update for February, 2026: +0.39 deg. C