Crowdsourcing An Opensource Temperature Data Monitoring Methodology and Spreadsheet

Walter Dnes – Click the pic to view at source

Image Credit: Walter Dnes

By Walter Dnes – Edited by WUWT Regular Just The Facts

I have developed a methodology and spreadsheet to capture and chart HadCRUT3, HadCRUT4, GISS, UAH, RSS, and NOAA monthly global temperature anomaly data. Calculations are also done to determine the slope of the anomaly data from any given month to the most recent month of data. Your help is needed to help validate the methodology I’ve created and help us to leverage the expertise and resources of WUWT keep this data reasonably up to date.

In order to dispense any potential legal/copyright questions:

1) I, Walter Dnes, hereby declare that the end-user programming in the spreadsheet and this article are entirely my work product.

2) I, Walter Dnes, grant you the royalty-free, perpetual, irrevocable, non-exclusive, transferable license to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, perform, and display the aforementioned end-user programming and spreadsheet (in whole or part) worldwide and/or to incorporate it in other works in any form, media, or technology now known or later developed.

The spreadsheet is on Google Docs at this URL:

https://docs.google.com/spreadsheet/ccc?key=0AnTohu4oFUbcdEgzTkpEYTAwN1BiXzJXMXZ5RVJiOUE&usp=sharing

Some of the graphs can take several seconds to render, because of the complexity and sheer amount of data. I acknowledge that implementing a spreadsheet via web interface is an amazing feat. I do not wish to detract from that, or complain about it. However, there are some limitations that require workarounds. I will note them as necessary.

I’ve sized all graphs to a 1920×1080 screen. My apologies to users with smaller screens. The details of the graph would be difficult to see if the graph was reduced.

The monthly date convention used in this spreadsheet is to refer to monthly data by the end of the month in question. So January 2008 would be 2008.083 (year 2008, offset by 1/12th of a year, i.e. 1 month). This keeps going through November 2008 (2008.917) and December 2008 (2009.000). This may seem a bit weird, but in computing, we often start at zero rather than 1. It works out better in many cases. Here is a sample set of dates to familiarize you with the idea…

2007/12 == 2008.000 where .000 = Dec data for previous year (2007)

2008/01 == 2008.083 where .083 = Jan data for current year (2008)

2008/02 == 2008.167 where .167 = Feb data for current year (2008)

2008/03 == 2008.250 where .250 = Mar data for current year (2008)

2008/04 == 2008.333 where .333 = Apr data for current year (2008)

2008/05 == 2008.417 where .417 = May data for current year (2008)

2008/06 == 2008.500 where .500 = Jun data for current year (2008)

2008/07 == 2008.583 where .583 = Jul data for current year (2008)

2008/08 == 2008.667 where .667 = Aug data for current year (2008)

2008/09 == 2008.750 where .750 = Sep data for current year (2008)

2008/10 == 2008.833 where .833 = Oct data for current year (2008)

2008/11 == 2008.917 where .917 = Nov data for current year (2008)

2008/12 == 2009.000 where .000 = Dec data for previous year (2008)

And now for an overview of the spreadsheet…

Tab “temp_data”:

Anomaly data

Column A is date in decimal years in the manner noted above.

Column B is HadCRUT3 anomaly data

Column C is HadCRUT4 anomaly data

Column D is GISS anomaly data

Column E is UAH anomaly data

Column F is RSS anomaly data

Column G is NOAA anomaly data

____________________________________________________________________

Slope data

Column I has the slope for each corresponding cell in column B (HadCRUT3) from that cell’s date (Column A) to the most recent month with data for that dataset.

Column J slope data for Column C (HadCRUT4)

Column K slope data for Column D (GISS)

Column L slope data for Column E (UAH)

Column M slope data for Column F (RSS)

Column N slope data for Column G (NOAA)

For columns I through N, the earliest cell with a negative value indicates how far back one can go in a temperature series, with a negative slope. The slope data is plotted in the tabs with the names of the datasets. This allows one to see where the slope value crosses zero.

____________________________________________________________________

12 month running means

Column P is HadCRUT3 12-month running mean anomaly.

Column Q is HadCRUT4 12-month running mean anomaly.

Column R is GISS 12-month running mean anomaly.

Column S is UAH 12-month running mean anomaly.

Column T is RSS 12-month running mean anomaly.

Column U is NOAA 12-month running mean anomaly.

Column V is left blank for data-import when updating data.

A couple of notes about limitations of Google’s online spreadsheet

1) You can not enter text manually in graph legends. The spreadsheet can, however, use text from the first row of the series, i.e. the “header row”. Cells P11 through U11 have the series’ names in them, for use in the legend.

2) Scatter graphs will not work properly with nulls/blanks in a series. In order to get the series of varying length to plot properly, dummy values have to be inserted to fill in shorter series. I use -9 as the filler value.

Tab “HadCRUT3”:

Is a graph of slope values for each month for the HadCRUT3 series. The slope is from the month of the cell (given in Column A) to the most recent month of data. It uses data from Column I. Note that due to complexity limits in the Google spreadsheet, the values are only calculated for part of the data series.

Tab “HadCRUT4”:

Is a graph of slope values for each month for the HadCRUT4 series, using data from Column J.

Tab “GISS”:

Is a graph of slope values for each month for the GISS series, using data from Column K.

Tab “UAH”:

Is a graph of slope values for each month for the UAH series, using data from Column L.

Tab “RSS”:

Is a graph of slope values for each month for the RSS series, using data from Column M.

Tab “NOAA”:

Is a graph of slope values for each month for the NOAA series, using data from Column N.

Tab 12mo1850:

Is a graph of 12-month running means of anomalies from January 1850 to present.

Tab 12mo1979:

Is a graph of 12-month running means of anomalies from 1979 to present. This covers the satellite data era.

Navigating Through The Spreadsheet:

Spreadsheet navigation is similar to Excel, with the most major difference being that pressing the {END} key immediately takes you to the far right-hand side of the page. Similarly, pressing the {HOME} key immediately takes you to the far left-hand side of the page. The equivalant to {END}{UP}, {END}{DOWN}, {END}{LEFTARROW}, and {END}{RIGHTARROW} combinations is to hold down the {CTRL} key while pressing the arrow in the direction you wish to jump.

Interpreting The Slope Graphs:

The slope graphs in the tabs “HadCRUT3”, “HadCRUT4”, “GISS”, “UAH”, “RSS”, and “NOAA” represent the slope from a given month to the latest available data. Note that the graphs are a guide to narrow down the earliest month with a negative slope. The authoritative numbers are in columns I through N of tab “temp_data”, which list the slopes. Two examples follow. ***IMPORTANT*** as additional months of data come in, the slope numbers and graphs will change each month. The numbers and graphs used in these examples were generated in mid-May 2013, using data to the end of April 2013. Do not expect to see the same numbers that you see in the screenshots. To find the longest period of negative slope, find the leftmost (i.e. earliest) point in graphs HadCRUT3/HadCRUT4/GISS/UAH/RSS/NOAA which has a value below zero.

The easy example is shown by image rssmag.png, which is a zoom of part of the graph in tab “RSS”. It’s obvious that 1997.0 (i.e. December 1996 is negative). The screenshot rssdata1.png confirms that 1997.0 has a negative value in column M, which contains RSS slopes.

Walter Dnes – Click the pic to view at source
Walter Dnes – Click the pic to view at source

UAH is a more difficult case. Data including April 2013 shows that the first negative slope value is sometime in 2008. See the zoomed image uahmag.png. The graph at least narrows down the month to somewhere in 2008. Image uahdata1.png shows that the first negative value in column L (UAH) is for 2008.583, i.e. July 2008.

Walter Dnes – Click the pic to view at source
Walter Dnes – Click the pic to view at source

“Coming Soon” Part 2: Instructions for Updating the Global Temperature Records in Google Docs:

Walter Dnes – Click the pic to view at source

This spreadsheet is intended as a proof-of-concept and a starting point for people who may want to extend it further. A followup post will deal with updating your own copy of this spreadsheet on Google Docs, or downloading and maintaing a local copy on your home machine.

Please let us know if you see any issues or errors within the methodology or spreadsheet. Also, please let us know your thoughts and recommendations on how we can best keep this spreadsheet reasonably up to date. Ideally we would like to automate this process or spread the work among a few WUWT’s readers. We also need to figure out how to make the resultant data readily available on WUWT, which could be accomplished through monthly/quarterly WUWT threads, a WUWT Reference Page or other communication method. Please let us know your thoughts below.

0 0 votes
Article Rating
49 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
crosspatch
May 23, 2013 3:42 pm

No CRN data?

Editor
May 23, 2013 3:52 pm

> crosspatch says: May 23, 2013 at 3:42 pm
> No CRN data?
I’m concentrating on global data sets in this spreadsheet. In a followup post, I’ll explain how to download the spreadsheet to your PC, or copy to your Google account. Once you have your own copy, you can customize it to your heart’s content. Note that the Google online spreadsheet is already rather slow. So I hesitate to add more data sets to this spreadsheets.

May 23, 2013 3:53 pm

You could also just use this…
http://www.woodfortrees.org/

Editor
May 23, 2013 4:08 pm

> Zeke Hausfather says: May 23, 2013 at 3:53 pm
> You could also just use this… http://www.woodfortrees.org/
It’s already listed on the WUWT sidebar. The spreadsheet in this article allows people to “roll their own”, add additional data sets, go into greater detail, and whatever other customizations they want. This goes beyond what you can do on WFT.

Werner Brozek
May 23, 2013 4:09 pm

Zeke Hausfather says:
May 23, 2013 at 3:53 pm
You could also just use this…
I really like WFT, however GISS and HadCRUT3 have not been updated since November so I had to rely on Walter and other sources for the latest times where the slope was 0.

Lance Wallace
May 23, 2013 4:20 pm

Which 30-year period are you using for a base?

Editor
May 23, 2013 4:28 pm

> Lance Wallace says: May 23, 2013 at 4:20 pm
> Which 30-year period are you using for a base?
I download the data from the respective websites, so it depends on what base period they use. For purposes of calculating slope, it doesn’t matter. We’re looking at the relative change over a period of time.

Nick Stokes
May 23, 2013 5:12 pm

It seems to me that Javascript with HTML 5 is more flexible, and also quicker to download. You can also do more interactive graphics (example here). Or here is an example with graph mobility and ability to add regression curves (and add your own data, and output numerics).
On updating, I’ve been using wget and cURL to automatically download data files each month – I run a script every night. They allow you to mirror – ie just download when there are updates. I keep the updated data here. It also automatically updates the graphs.

cd
May 23, 2013 5:16 pm

I read only half-way down but it seems as if you’re trying to reproduce something that could be done quite easily using JavaScript or – I suspect – WordPress’ own plotting widget.
You could even have a processing tool for regression fits (linear, pwr law, polynomial etc.), convolution/FFT (+processing), even moving FFT or FWT, etc.
But the short answer is that you need time and people to do it. Even I could write the code in C (library of a few 1000 lines of code would do all the maths) to do all above but you’re need some type of php to interface although I doubt WordPress would like this very much (security risk that exposes too much native code). Thing is you need to sit down with someone at WUWT and work through aims, needs => design (maintainability) => output.
I’ll ask around and get a few experts to read the post and see what they suggest.

Editor
May 23, 2013 5:29 pm

> Nick Stokes says: May 23, 2013 at 5:12 pm
> On updating, I’ve been using wget and cURL to automatically download
> data files each month – I run a script every night. They allow you to
> mirror – ie just download when there are updates. I keep the updated
> data here. It also automatically updates the graphs.
One advantage of doing it manually is that you find out right away when the version number is bumped, and the source URL changes. E.g. HadCRUT4 is now at http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.monthly_ns_avg.txt
http colon slash slash http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.monthly_ns_avg.txt
The whole point of this excercise is to allow a competent spreadsheet user to do it all on their home PC. A followup post in the next few days will get into the details of doing monthly updates.

Nick Stokes
May 23, 2013 5:46 pm

Walter,
“HadCRUT4 is now at…”
Yes, that was a nuisance. But there’s a way. On this page, which doesn’t change its URL, just locate the link in the HTML.

Editor
May 23, 2013 5:47 pm

> cd says: May 23, 2013 at 5:16 pm
> But the short answer is that you need time and people to do it. Even
> I could write the code in C (library of a few 1000 lines of code would
> do all the maths) to do all above but you’re need some type of php
> to interface although I doubt WordPress would like this very much
> (security risk that exposes too much native code). Thing is you need
> to sit down with someone at WUWT and work through aims,
> needs => design (maintainability) => output.
That’s exactly what I’m trying to avoid. A followup post will discuss downloading the spreadsheet and fixing up the graphs (because the translation of graphs during the download has problems). I want something that a competent spreadsheet user can update and customize on their own.

Greg Goodman
May 23, 2013 5:53 pm

“This goes beyond what you can do on WFT.”
Indeed, WTF is too limited.
Echoing some comment above , spend time defining the objectives before doing too much work.
On what you have so far.
If you are plotting slopes you are interested in rate of change, why not plot rate of change directly (ie the monthly incremental changes, = “first difference”)
It’s easier to visualise what you are actually interested in. Not sure why the cumulative slope. This has changing changing sensitivity and frequency response as it progresses. Not sure this would be informative.
Argh, don’t use running averages. Here’s why not and a better option:
http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/
December2008=2008 is crazy confusing and pointless. I would suggest logging data that is a monthly average at the middle of the month : Jan = 1/24 ; Feb =3/24 etc.
That way you are not introducing a phase shift into the data and not calling 2007 data 2008.
Hope that helps.

Editor
May 23, 2013 6:21 pm

> Snowlover123 says: May 23, 2013 at 5:49 pm
> Is it possible to graph the JMA temperatures with the data above?
Any monthly data. Wait for my next post about downloading the spreadsheet and updating/customizing it.

JMI
May 23, 2013 6:37 pm

You guys certainly have a good handle on things crowd-wise!
Though, I’ve also come across some new research that you may find interesting in this regard…
It’s called “The Theory of Crowd Capital” and you can download it here if you’re interested: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2193115
Enjoy!

Editor
May 23, 2013 6:40 pm

> Greg Goodman says: May 23, 2013 at 5:53 pm
> If you are plotting slopes you are interested in rate of change,
> why not plot rate of change directly (ie the monthly incremental
> changes, = “first difference”)
> It’s easier to visualise what you are actually interested in. Not
> sure why the cumulative slope. This has changing changing
> sensitivity and frequency response as it progresses. Not sure
> this would be informative.
Like it or not, one of the major arguments right now is “no global warming for X years”. The CAGW crowd has drawn the line at 17 years as proof that global warming has stopped. That’s why people are fixated on how far back we can go with a negative slope.
> Argh, don’t use running averages. Here’s why not and a better option:
> http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/
Again, we’re debating the CAGW crowd. The “annual temperature anomaly” is what the public is fixated on. This is the 12 month running mean from January through December. My plot includes all 12 points in the year. Can you persuade GISS etal to use the result of a triple running mean filter, rather than a straight 12 month average?

William Astley
May 23, 2013 6:40 pm

In reply to:
Zeke Hausfather says:
May 23, 2013 at 3:53 pm
You could also just use this…
http://www.woodfortrees.org/
Woods for trees has for some unexplained reason dropped UAH global temperature anomaly.
It is interesting that the satellite UAH global temperature anomaly is less than HADCRUT4 and GISS in the 1980s, while HADCRUT4 and GISS are almost 0.4 higher than UAH in 2013.
It appears the science is not settled in the measurement and the manipulation of planetary temperature.
https://docs.google.com/spreadsheet/ccc?key=0AnTohu4oFUbcdEgzTkpEYTAwN1BiXzJXMXZ5RVJiOUE#gid=15

Nick Stokes
May 23, 2013 6:52 pm

JTF,
I start in 1979 because that is when satellites start. I actually started collecting the data in 2010, so couldn’t use that decade, though I maybe should have gone to 2009.
But it’s just a matter of subtracting off an average – a single number. It doesn’t matter very much which period, as long as it’s the same for each.. My main concern was to have them all on the same base.

JJ
May 23, 2013 7:08 pm

Your numbering convention (2000.000 = Dec 1999) is different than most similar implementations. Not wrong, but not standard and counterintuitive.
Your slope appears to be a two point calculation. Much less useful than the standard method of calcing slope of temp series, which would be the slope of the linear regression of all of the data between the endpoints…

Editor
May 23, 2013 7:20 pm

> JJ says: May 23, 2013 at 7:08 pm
> Your slope appears to be a two point calculation. Much less useful than
> the standard method of calcing slope of temp series, which would be the
> slope of the linear regression of all of the data between the endpoints…
??? The slope uses the spreadsheet “slope” function, which uses all the data points. E.g. cell M1764 is
=slope(F1764:F$1960,$A1764:$A$1960)

TRBixler
May 23, 2013 8:07 pm

So we accurately “know” the temperature to .3000 between datasets. While the trends diverge.

Werner Brozek
May 23, 2013 8:20 pm

William Astley says:
May 23, 2013 at 6:40 pm
Woods for trees has for some unexplained reason dropped UAH global temperature anomaly.
??At the present time, of the 7 sets of data that I tried to give, only UAH and RSS are up to date. See:
http://www.woodfortrees.org/plot/wti/from:2012.5/plot/gistemp/from:2012.5/plot/uah/from:2012.5/plot/rss/from:2012.5/plot/hadsst2gl/from:2012.5/plot/hadcrut4gl/from:2012.5/plot/hadcrut3gl/from:2012.5

James from Arding
May 23, 2013 8:25 pm

This could be useful. I look forward to seeing how you read and process the data.
I would like to see the CET data on the same graphs.
Thanks.

Editor
May 23, 2013 9:07 pm

> James from Arding says: May 23, 2013 at 8:25 pm
> This could be useful. I look forward to seeing how you read and
> process the data. I would like to see the CET data on the same graphs.
It’s a regional data set, like USCRN. It goes back to 1659 and has a negative slope from July 1987. Unfortunately, I think it would blow up on Google spreadsheets’ limits to add it to this sheet. First, every data set would have to be extended back to 1659 (even if it’s nulls). Google spreadsheets have cell count limits. Secondly, it would add a lot more cells-with-formulas than a regular series, because the values reported are actual temps rather than anomalies. On my home PC, I have to average the 30 Januarys from 1961-1990, the 30 Februarys,…, through to the 30 Decembers, to create a 1961-1990 normals. Then I need to add another column which has the result of each month minus its normal value; i.e. the anomaly. Then I can actually analyze+plot the anomaly values.
It would probably work as a stand-alone spreadsheet, but I have my doubts about adding it to the current one.

JJ
May 23, 2013 9:27 pm

walterdnes says:
??? The slope uses the spreadsheet “slope” function, which uses all the data points. …

Excellent!

Editor
May 23, 2013 9:31 pm

An idea just occured to me. Rather than doing the plotting in a Google spreadsheet, maybe we could simply have a “data central” spreadsheet with just monthly data from various series in it. WUWT’s task would be to ensure that the data is uptodate.
It seems that graphs simply do not translate well between Google Docs, Excel, Gnumeric, OpenOffice, etc. Even different versions of Excel have problems passing graphs to each other. But passing numeric data and formulas seems to work flawlessly.
People could download from “data central” and analyze/plot as they wished on their home PCs. Another possibility is to use linked spreadsheets on Google Docs, that would generate plots off “data central”. This would be a more complex approach, but might solve the problem of trying to crowd every analysis into one spreadsheet.
Thoughts? Comments?

Steve McIntyre
May 23, 2013 9:46 pm

If anyone is interested in doing statistical analysis, I recommend that you take the few minutes to learn R. You can retrieve the data and do the analysis with simple scripts. An extra advantage is that third parties can readily verify your scripts.

Ivan
May 23, 2013 10:05 pm

this is maybe off-topic but I am curious: what happened to Watts et al 2012 paper released last July? Why waste our time with the data most of which is obvious garbage produced by successive series of massaging, adjusting and readjusting in order to increase the trend?

Editor
May 23, 2013 10:13 pm

justthefactswuwt says: May 23, 2013 at 9:55 pm
> Eventually we will want to develop something more involved,
> with advanced plotting and analysis capabilities
I like Steve McIntyre’s suggestion about using R. It seems to be scriptable, runnable from a command line if you wish, and available free for Windows/Apple/Linux/etc. And there’s a large community behind it.
That’s it for tonight. It’s after 1:00 AM here in Toronto. I’ll be back in a few hours.

Nick Stokes
May 23, 2013 10:35 pm

justthefactswuwt says: May 23, 2013 at 9:33 pm
“Christy noted the same about RSS last year,…”

Yes, but in 2010 when I started, both RSS and UAH were using 1979-1998. I’m reminded that that was the main reason for my choice.
“using brief and arbitrary base periods can lead to biases …”
This mainly comes in where there are missing data, and you are compiling an average. When you subtract a mean, it has uncertainty, which adds to the overall uncertainty. But if there is no missing data, that is just an overall offset, and the zero point is accepted to be arbitrary. The biases come in making an average which shifts when component data series drop out.
But here I am not making an average, just plotting. And there isn’t missing data, except where the series start.

James from Arding
May 23, 2013 11:25 pm

Thanks for your comments walterndnes.
Just FYI I pulled your spreadsheet into Excel 2013 and it worked fine, graphs & all. A few twitches and I had years on the X labels :-). I have a subscription to Office 365 and excel works very nicely in a web browser on Skydrive! Not suggesting everyone should do that.
I understand some of the differences with local data sets and I appreciate your efforts in trying to get a simple method for anyone to get access to these data and display them in meaningful ways.
I am interested in somehow trying to automate the data downloading and processing of Australian met data. I would be very interested in comparing over time the official forecasts with the actual data.
Steve McIntyre: I have started down the route using R but haven’t had enough time to get very far yet. It would be nice if there were some example scripts somewhere – (thinks… I guess I should go look for some :-). It does seem to be the best tool in the long run for this.

A Crooks
May 24, 2013 1:30 am

Incidentally, I like your 12 month running mean graph. If you just concentrate on one data set – say the purple – it shows up very clearly the 7.5 year cycle of small tough followed by much deeper trough that sits under the broad rising trend.
deep troughs at 1986, 1993.5, 2001, 2008.5
Next one will be at 2016
Cheers

Greg Goodman
May 24, 2013 3:57 am

Steve McIntyre says: If anyone is interested in doing statistical analysis, I recommend that you take the few minutes to learn R.
LOL. I have a lot of respect for your competence, tenacity and integrity but “a few minutes” ? You jest!
I think Willis’ description of a learning curve so steep it gives you nose bleeds would be more realistic.
R language is so cryptic and arcane that just working out the required syntax to do a linear regression is like decrypting a fragment of the Dead Sea scrolls.
Maybe to someone with a statistics/econometrics background it would be more accessible but to my computing/engineering/science background it is a cryptic nightmare.
To other, YMMV, of course.
I use gnuplot for similar reasons that Steve suggests using R. It still requires considerably more that “a few minutes” to get to fitting and plotting complicated graphs and does not include the advanced statistical features of R.

Greg Goodman
May 24, 2013 4:09 am

Wlater: Like it or not, one of the major arguments right now is “no global warming for X years”. The CAGW crowd has drawn the line at 17 years as proof that global warming has stopped. That’s why people are fixated on how far back we can go with a negative slope.
In what way is a rate of change of zero less clear than trying to guess by eye how flat a wiggly line is?
“Again, we’re debating the CAGW crowd. The “annual temperature anomaly” is what the public is fixated on. This is the 12 month running mean from January through December. My plot includes all 12 points in the year. Can you persuade GISS etal to use the result of a triple running mean filter, rather than a straight 12 month average?”
The public would not know what an “anomaly” is except that is sounds bad ‘cos it’s abnormal.
The anomaly is not ” the 12 month running mean from January through December” so I guess you don’t understand what it is either.
An average and a running average is not the same thing. You are so far from understanding what you are looking at and more interested in rebutting ideas than learning , I’ll just wish you good luck and not waste my time.
JTF had been receptive to suggestions previously, that was why I posted.

Paul Evans
May 24, 2013 6:13 am

It woud nice to see JMA on there as well

Richard M
May 24, 2013 6:26 am

If the charts are going to be placed in a reference page I think a nice addition would be to add a line representing raw numbers, no adjustments whatsoever, if it exists. It would be nice to have one place to point to when trying to explain the affect of all the adjustments.

Eric Ellison
May 24, 2013 8:55 am

Walter
Thanks for this contribution! I am a non-programmer, non-mathematician fascinated by this venue of great contributors. I spend about 4 hours here on WUWT every day! I am a ham radio operator and for quite a number of years been contributing, and following an amateur effort in developing software defined radio, (SDR) both hardware and software. It is an ongoing worldwide contributor effort.
http://openhpsdr.org/
It would not have been possible, or at the least not easy without (crowdsourced) versioning software, in this case Tortoise SVN.
http://tortoisesvn.net/features.html
Obviously your ‘crowd sourcing’ is programming related and there are many ‘candidates’: R, C, Java to name a few, each could be handled as ‘branches’ in SVN supported here on WUWT, along with the constantly updated data. The best part is that we non-experienced ‘beta-testers’ can fiddle with the program locally in a Windows (or other OS) environment without committing to any changes.
On another note, I would gladly donate a couple hundred dollars of my Social Security check towards providing a web based license for a program like Mathematica for example. There are probably many other expensive statistical, graphing, or simulation tools which would qualify. That way Steve, Willis, Anthony etc, could have access to licenses for software which are beyond personal justification of purchase. I’m sure that many of the visitors to WUWT would contribute to purchase of these tools!
Eric

Editor
May 24, 2013 8:58 am

> Greg Goodman says: May 24, 2013 at 4:09 am
> In what way is a rate of change of zero less clear
> than trying to guess by eye how flat a wiggly line is?
The spreadsheet slope() function is a mathematical construct that has more mathematical theory behind it, and is more respected than a “guess by eye”.
> The anomaly is not ” the 12 month running mean from January
> through December” so I guess you don’t understand what it is either.
The point I was trying to make is that once every 12 months, the 12 month running mean of the monthly anomaly coincides with the annual anomaly that GISS/Hadley etc splash in their news releases. It is what it is. I’m not attributing any magical properties to it.

Janice Moore
May 24, 2013 10:48 am

THANK YOU, Walter Dnes and Just the Facts!
WOW. Your generosity and hard work (and conscientious accuracy) deserve far more praise than you have received, so far. Not to say that all the constructive criticism is not helpful and good, but, more often than not, it is offered with little or no accompanying warmth and collegiality. You deserve better.
I cannot help at ALL, but am so grateful to you for providing this highly useful product.
Applause! Applause! Applause! (even if only from the peanut section — #[:)])

E.M.Smith
Editor
May 24, 2013 11:34 am

Um, I’ve not gone through the spread sheet yet, but just an “FYI” kind of note:
What is the “GISS data” changes from run to run. GHCN has updates that “trickle in” over time. So you may well find that “this month” has a run where, say, Bahamas from 3 months back is ‘missing’ so gets ‘in filled’ with created stuff. Then a couple of months later that datum show up, and the result changes…
In addition to that “constantly mutating” most recent “few months”… every so often folks go back and “adjust” (fudge?), fill in, recreate, or just ‘discover’ new data items in the past. Sometimes the far distant past…. So the GHCN is a ‘moving river’ and “You can never cross the same river twice”…
Then there are the major revisions that result in new versions. V1 vs V2 vs V3. These, too, “change”, though they change many more data items.
In each case, since each run of GIStemp is “dynamic” in terms of what it ‘fills in’, and thus what it averages, the entire body of “GIStemp data” changes. No two runs are the same as the input changes…. the past is a ‘moving river’ too…
http://chiefio.wordpress.com/2012/06/20/summary-report-on-v1-vs-v3-ghcn/
The particular stations use to create infill values depends on what stations data are available. It chooses via a ‘closest and longest’ method. So as the distance of stations with missing data and the length of those segments changes, the stations used for ‘in fill’ changes so the in fill values change so the end product changes…. There is a selection that says ~”must have FOO years of data” so a station that does NOTHING in one run, being one month short of the test, can suddenly become THE most used for adjusting / infilling another; once it gets one more month of data…
No, I’ve not measured how much this impacts the results. I can only state that it happens. (Or, rather, happened… GIStemp no longer is publishing their current software: http://chiefio.wordpress.com/2013/05/19/gistemp-no-new-source-code/ so what GIStemp is doing today is an unknown / black box.)
The upshot of all this is that for GISS (and potentially to some extent for anyone using GHCN with similar shared infill methods) their entire historical “data” set slowly changes and mutates. With every single month of temperature data updates.
So “good luck” figuring out what “data” to load for them, or getting any sort of “historical trend”, since the historical trend this year is quite quite different from what it was 2 years ago or a decade or so back (even if the ‘historical period’ was, say 1880-1970… Yes, “history” constantly mutates with these codes. Typically to increasing warming “trend”… even in what ought to be static “history”…)
I don’t have a solution for it. At this point the “raw” data isn’t even raw. Every source is now “adjusted” in various ways, sometimes called “Quality Controlled”… (one QC method fills in any data item that does not conform enough to surrounding ASOS locations with an AVERAGE of those ASOS stations… ignoring that an average always reduces range… and ignoring that ASOS stations are typically at airports that are Tarmac Heaven and grow over time…) So even just figuring out what the temperature measured in, say, Chicago in 1910 might have been, is a challenge…
Oh, and next month it may change anyway… especially in GIStemp…