Crowdsourcing An Opensource Temperature Data Monitoring Methodology and Spreadsheet

Walter Dnes – Click the pic to view at source

Image Credit: Walter Dnes

By Walter Dnes – Edited by WUWT Regular Just The Facts

I have developed a methodology and spreadsheet to capture and chart HadCRUT3, HadCRUT4, GISS, UAH, RSS, and NOAA monthly global temperature anomaly data. Calculations are also done to determine the slope of the anomaly data from any given month to the most recent month of data. Your help is needed to help validate the methodology I’ve created and help us to leverage the expertise and resources of WUWT keep this data reasonably up to date.

In order to dispense any potential legal/copyright questions:
1) I, Walter Dnes, hereby declare that the end-user programming in the spreadsheet and this article are entirely my work product.
2) I, Walter Dnes, grant you the royalty-free, perpetual, irrevocable, non-exclusive, transferable license to use, reproduce, modify, adapt, publish, translate, create derivative works from, distribute, perform, and display the aforementioned end-user programming and spreadsheet (in whole or part) worldwide and/or to incorporate it in other works in any form, media, or technology now known or later developed.

The spreadsheet is on Google Docs at this URL:
https://docs.google.com/spreadsheet/ccc?key=0AnTohu4oFUbcdEgzTkpEYTAwN1BiXzJXMXZ5RVJiOUE&usp=sharing

Some of the graphs can take several seconds to render, because of the complexity and sheer amount of data. I acknowledge that implementing a spreadsheet via web interface is an amazing feat. I do not wish to detract from that, or complain about it. However, there are some limitations that require workarounds. I will note them as necessary.

I’ve sized all graphs to a 1920×1080 screen. My apologies to users with smaller screens. The details of the graph would be difficult to see if the graph was reduced.

The monthly date convention used in this spreadsheet is to refer to monthly data by the end of the month in question. So January 2008 would be 2008.083 (year 2008, offset by 1/12th of a year, i.e. 1 month). This keeps going through November 2008 (2008.917) and December 2008 (2009.000). This may seem a bit weird, but in computing, we often start at zero rather than 1. It works out better in many cases. Here is a sample set of dates to familiarize you with the idea…

2007/12 == 2008.000 where .000 = Dec data for previous year (2007)
2008/01 == 2008.083 where .083 = Jan data for current year (2008)
2008/02 == 2008.167 where .167 = Feb data for current year (2008)
2008/03 == 2008.250 where .250 = Mar data for current year (2008)
2008/04 == 2008.333 where .333 = Apr data for current year (2008)
2008/05 == 2008.417 where .417 = May data for current year (2008)
2008/06 == 2008.500 where .500 = Jun data for current year (2008)
2008/07 == 2008.583 where .583 = Jul data for current year (2008)
2008/08 == 2008.667 where .667 = Aug data for current year (2008)
2008/09 == 2008.750 where .750 = Sep data for current year (2008)
2008/10 == 2008.833 where .833 = Oct data for current year (2008)
2008/11 == 2008.917 where .917 = Nov data for current year (2008)
2008/12 == 2009.000 where .000 = Dec data for previous year (2008)

And now for an overview of the spreadsheet…

Tab “temp_data”:
Anomaly data
Column A is date in decimal years in the manner noted above.
Column B is HadCRUT3 anomaly data
Column C is HadCRUT4 anomaly data
Column D is GISS anomaly data
Column E is UAH anomaly data
Column F is RSS anomaly data
Column G is NOAA anomaly data
____________________________________________________________________
Slope data
Column I has the slope for each corresponding cell in column B (HadCRUT3) from that cell’s date (Column A) to the most recent month with data for that dataset.
Column J slope data for Column C (HadCRUT4)
Column K slope data for Column D (GISS)
Column L slope data for Column E (UAH)
Column M slope data for Column F (RSS)
Column N slope data for Column G (NOAA)

For columns I through N, the earliest cell with a negative value indicates how far back one can go in a temperature series, with a negative slope. The slope data is plotted in the tabs with the names of the datasets. This allows one to see where the slope value crosses zero.
____________________________________________________________________
12 month running means
Column P is HadCRUT3 12-month running mean anomaly.
Column Q is HadCRUT4 12-month running mean anomaly.
Column R is GISS 12-month running mean anomaly.
Column S is UAH 12-month running mean anomaly.
Column T is RSS 12-month running mean anomaly.
Column U is NOAA 12-month running mean anomaly.

Column V is left blank for data-import when updating data.

A couple of notes about limitations of Google’s online spreadsheet

1) You can not enter text manually in graph legends. The spreadsheet can, however, use text from the first row of the series, i.e. the “header row”. Cells P11 through U11 have the series’ names in them, for use in the legend.

2) Scatter graphs will not work properly with nulls/blanks in a series. In order to get the series of varying length to plot properly, dummy values have to be inserted to fill in shorter series. I use -9 as the filler value.

Tab “HadCRUT3″:
Is a graph of slope values for each month for the HadCRUT3 series. The slope is from the month of the cell (given in Column A) to the most recent month of data. It uses data from Column I. Note that due to complexity limits in the Google spreadsheet, the values are only calculated for part of the data series.

Tab “HadCRUT4″:
Is a graph of slope values for each month for the HadCRUT4 series, using data from Column J.

Tab “GISS”:
Is a graph of slope values for each month for the GISS series, using data from Column K.

Tab “UAH”:
Is a graph of slope values for each month for the UAH series, using data from Column L.

Tab “RSS”:
Is a graph of slope values for each month for the RSS series, using data from Column M.

Tab “NOAA”:
Is a graph of slope values for each month for the NOAA series, using data from Column N.

Tab 12mo1850:
Is a graph of 12-month running means of anomalies from January 1850 to present.

Tab 12mo1979:
Is a graph of 12-month running means of anomalies from 1979 to present. This covers the satellite data era.

Navigating Through The Spreadsheet:
Spreadsheet navigation is similar to Excel, with the most major difference being that pressing the {END} key immediately takes you to the far right-hand side of the page. Similarly, pressing the {HOME} key immediately takes you to the far left-hand side of the page. The equivalant to {END}{UP}, {END}{DOWN}, {END}{LEFTARROW}, and {END}{RIGHTARROW} combinations is to hold down the {CTRL} key while pressing the arrow in the direction you wish to jump.

Interpreting The Slope Graphs:
The slope graphs in the tabs “HadCRUT3″, “HadCRUT4″, “GISS”, “UAH”, “RSS”, and “NOAA” represent the slope from a given month to the latest available data. Note that the graphs are a guide to narrow down the earliest month with a negative slope. The authoritative numbers are in columns I through N of tab “temp_data”, which list the slopes. Two examples follow. ***IMPORTANT*** as additional months of data come in, the slope numbers and graphs will change each month. The numbers and graphs used in these examples were generated in mid-May 2013, using data to the end of April 2013. Do not expect to see the same numbers that you see in the screenshots. To find the longest period of negative slope, find the leftmost (i.e. earliest) point in graphs HadCRUT3/HadCRUT4/GISS/UAH/RSS/NOAA which has a value below zero.

The easy example is shown by image rssmag.png, which is a zoom of part of the graph in tab “RSS”. It’s obvious that 1997.0 (i.e. December 1996 is negative). The screenshot rssdata1.png confirms that 1997.0 has a negative value in column M, which contains RSS slopes.

Walter Dnes – Click the pic to view at source

Walter Dnes – Click the pic to view at source

UAH is a more difficult case. Data including April 2013 shows that the first negative slope value is sometime in 2008. See the zoomed image uahmag.png. The graph at least narrows down the month to somewhere in 2008. Image uahdata1.png shows that the first negative value in column L (UAH) is for 2008.583, i.e. July 2008.

Walter Dnes – Click the pic to view at source

Walter Dnes – Click the pic to view at source

“Coming Soon” Part 2: Instructions for Updating the Global Temperature Records in Google Docs:

Walter Dnes – Click the pic to view at source

This spreadsheet is intended as a proof-of-concept and a starting point for people who may want to extend it further. A followup post will deal with updating your own copy of this spreadsheet on Google Docs, or downloading and maintaing a local copy on your home machine.

Please let us know if you see any issues or errors within the methodology or spreadsheet. Also, please let us know your thoughts and recommendations on how we can best keep this spreadsheet reasonably up to date. Ideally we would like to automate this process or spread the work among a few WUWT’s readers. We also need to figure out how to make the resultant data readily available on WUWT, which could be accomplished through monthly/quarterly WUWT threads, a WUWT Reference Page or other communication method. Please let us know your thoughts below.

About these ads
This entry was posted in Lower Troposphere Temperature, measurement, Temperature and tagged , , , , . Bookmark the permalink.

49 Responses to Crowdsourcing An Opensource Temperature Data Monitoring Methodology and Spreadsheet

  1. Walter Dnes says:

    > crosspatch says: May 23, 2013 at 3:42 pm

    > No CRN data?

    I’m concentrating on global data sets in this spreadsheet. In a followup post, I’ll explain how to download the spreadsheet to your PC, or copy to your Google account. Once you have your own copy, you can customize it to your heart’s content. Note that the Google online spreadsheet is already rather slow. So I hesitate to add more data sets to this spreadsheets.

  2. walterdnes says:

    > Zeke Hausfather says: May 23, 2013 at 3:53 pm

    > You could also just use this… http://www.woodfortrees.org/

    It’s already listed on the WUWT sidebar. The spreadsheet in this article allows people to “roll their own”, add additional data sets, go into greater detail, and whatever other customizations they want. This goes beyond what you can do on WFT.

  3. Werner Brozek says:

    Zeke Hausfather says:
    May 23, 2013 at 3:53 pm
    You could also just use this…
    I really like WFT, however GISS and HadCRUT3 have not been updated since November so I had to rely on Walter and other sources for the latest times where the slope was 0.

  4. Lance Wallace says:

    Which 30-year period are you using for a base?

  5. walterdnes says:

    > Lance Wallace says: May 23, 2013 at 4:20 pm

    > Which 30-year period are you using for a base?

    I download the data from the respective websites, so it depends on what base period they use. For purposes of calculating slope, it doesn’t matter. We’re looking at the relative change over a period of time.

  6. Nick Stokes says:

    It seems to me that Javascript with HTML 5 is more flexible, and also quicker to download. You can also do more interactive graphics (example here). Or here is an example with graph mobility and ability to add regression curves (and add your own data, and output numerics).

    On updating, I’ve been using wget and cURL to automatically download data files each month – I run a script every night. They allow you to mirror – ie just download when there are updates. I keep the updated data here. It also automatically updates the graphs.

  7. cd says:

    I read only half-way down but it seems as if you’re trying to reproduce something that could be done quite easily using JavaScript or – I suspect – WordPress’ own plotting widget.

    You could even have a processing tool for regression fits (linear, pwr law, polynomial etc.), convolution/FFT (+processing), even moving FFT or FWT, etc.

    But the short answer is that you need time and people to do it. Even I could write the code in C (library of a few 1000 lines of code would do all the maths) to do all above but you’re need some type of php to interface although I doubt WordPress would like this very much (security risk that exposes too much native code). Thing is you need to sit down with someone at WUWT and work through aims, needs => design (maintainability) => output.

    I’ll ask around and get a few experts to read the post and see what they suggest.

  8. walterdnes says:

    > Nick Stokes says: May 23, 2013 at 5:12 pm

    > On updating, I’ve been using wget and cURL to automatically download
    > data files each month – I run a script every night. They allow you to
    > mirror – ie just download when there are updates. I keep the updated
    > data here. It also automatically updates the graphs.

    One advantage of doing it manually is that you find out right away when the version number is bumped, and the source URL changes. E.g. HadCRUT4 is now at http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.monthly_ns_avg.txt

    http colon slash slash http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.2.0.0.monthly_ns_avg.txt

    The whole point of this excercise is to allow a competent spreadsheet user to do it all on their home PC. A followup post in the next few days will get into the details of doing monthly updates.

  9. Nick Stokes says:

    Walter,
    “HadCRUT4 is now at…”
    Yes, that was a nuisance. But there’s a way. On this page, which doesn’t change its URL, just locate the link in the HTML.

  10. walterdnes says:

    > cd says: May 23, 2013 at 5:16 pm

    > But the short answer is that you need time and people to do it. Even
    > I could write the code in C (library of a few 1000 lines of code would
    > do all the maths) to do all above but you’re need some type of php
    > to interface although I doubt WordPress would like this very much
    > (security risk that exposes too much native code). Thing is you need
    > to sit down with someone at WUWT and work through aims,
    > needs => design (maintainability) => output.

    That’s exactly what I’m trying to avoid. A followup post will discuss downloading the spreadsheet and fixing up the graphs (because the translation of graphs during the download has problems). I want something that a competent spreadsheet user can update and customize on their own.

  11. Greg Goodman says:

    “This goes beyond what you can do on WFT.”

    Indeed, WTF is too limited.

    Echoing some comment above , spend time defining the objectives before doing too much work.

    On what you have so far.

    If you are plotting slopes you are interested in rate of change, why not plot rate of change directly (ie the monthly incremental changes, = “first difference”)

    It’s easier to visualise what you are actually interested in. Not sure why the cumulative slope. This has changing changing sensitivity and frequency response as it progresses. Not sure this would be informative.

    Argh, don’t use running averages. Here’s why not and a better option:

    http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

    December2008=2008 is crazy confusing and pointless. I would suggest logging data that is a monthly average at the middle of the month : Jan = 1/24 ; Feb =3/24 etc.

    That way you are not introducing a phase shift into the data and not calling 2007 data 2008.

    Hope that helps.

  12. walterdnes says:

    > Snowlover123 says: May 23, 2013 at 5:49 pm

    > Is it possible to graph the JMA temperatures with the data above?

    Any monthly data. Wait for my next post about downloading the spreadsheet and updating/customizing it.

  13. Lance Wallace says: May 23, 2013 at 4:20 pm

    Which 30-year period are you using for a base?

    Good question. We should be using 1981–2010 as our base period;

    in order to comply with a recommended World Meteorological Organization (WMO) Policy, which suggests using the latest decade for the 30-year average.

    http://www.ncdc.noaa.gov/cmb-faq/anomalies.php

    However, there is significant variation across the data sets, i.e.:

    UAH

    The global, hemispheric, and tropical LT anomalies from the 30-year (1981-2010) average

    http://www.drroyspencer.com/2013/04/uah-global-temperature-update-for-march-2013-0-18-deg-c-again/

    GISS

    Anomalies are relative to the 1951-80 base period means

    http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt

    HadCRUT4

    Time series are presented as temperature anomalies (deg C) relative to 1961-1990

    http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/download.html

    RSS

    Anomalies are computed by subtracting the mean monthly value (averaged from 1979 through 1998 for each channel) from the average brightness temperature for each month.

    http://www.ssmi.com/msu/msu_data_description.html#rss_msu_data_analysis

    Beginning in December 2010, all lower troposphere, middle troposphere, and lower stratosphere satellite data are reported here with respect to the 1981–2010 base period. Prior to December 2010, data were reported with respect to the 1979–1998 base period. Remote Sensing Systems continues to provide data to NCDC with respect to the 1979–1998 base period; however, NCDC readjusts the data to the 1981–2010 base period so that the satellite measurements are comparable. http://www.ncdc.noaa.gov/temp-and-precip/msu/

    NOAA NCDC

    “The global and hemispheric anomalies are provided with respect to the period 1901-2000, the 20th century average.”

    “Why do some of the products use different reference periods?

    The maps show temperature anomalies relative to the 1981–2010 base period. This period is used in order to comply with a recommended World Meteorological Organization (WMO) Policy, which suggests using the latest decade for the 30-year average. For the global-scale averages (global land and ocean, land-only, ocean-only, and hemispheric time series), the reference period is adjusted to the 20th Century average for conceptual simplicity (the period is more familiar to more people, and establishes a longer-term average). The adjustment does not change the shape of the time series or affect the trends within it.

    http://www.ncdc.noaa.gov/cmb-faq/anomalies.php

    NOAA CPC

    NOAA’s Climate Prediction Center has already changed their Normals to the 1981 – 2010 base period? Why are those Normals not available?

    Many organizations, including NOAA’s Climate Prediction Center (CPC), develop their own averages and change base periods for internal use. However, NCDC’s climate Normals are the official United States Normals as recognized by the World Meteorological Organization and the main Normals made available for a variety of variables. Below is a brief summary of changes to http://www.ncdc.noaa.gov/cmb-faq/anomalies.phpthe CPC products due to the change in climate base period from 1971 – 2000 to 1981 – 2010:

    NOAA Climate Prediction Center’s CAMS station temperature anomaly dataset. “CAMS” is an acronym for the “Climate Anomaly Monitoring System” in use at the Climate Prediction Center (CPC).

    CAMS station surface air temperature anomalies for the globe with respect to the 1971-2000 climatological base period.

    http://iridl.ldeo.columbia.edu/maproom/Global/Atm_Temp/Monthly_stn_anom.html

    Please post base periods and source links to any data sets I missed.

  14. JMI says:

    You guys certainly have a good handle on things crowd-wise!
    Though, I’ve also come across some new research that you may find interesting in this regard…
    It’s called “The Theory of Crowd Capital” and you can download it here if you’re interested: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2193115

    Enjoy!

  15. walterdnes says:

    > Greg Goodman says: May 23, 2013 at 5:53 pm

    > If you are plotting slopes you are interested in rate of change,
    > why not plot rate of change directly (ie the monthly incremental
    > changes, = “first difference”)

    > It’s easier to visualise what you are actually interested in. Not
    > sure why the cumulative slope. This has changing changing
    > sensitivity and frequency response as it progresses. Not sure
    > this would be informative.

    Like it or not, one of the major arguments right now is “no global warming for X years”. The CAGW crowd has drawn the line at 17 years as proof that global warming has stopped. That’s why people are fixated on how far back we can go with a negative slope.

    > Argh, don’t use running averages. Here’s why not and a better option:
    > http://climategrog.wordpress.com/2013/05/19/triple-running-mean-filters/

    Again, we’re debating the CAGW crowd. The “annual temperature anomaly” is what the public is fixated on. This is the 12 month running mean from January through December. My plot includes all 12 points in the year. Can you persuade GISS etal to use the result of a triple running mean filter, rather than a straight 12 month average?

  16. William Astley says:

    In reply to:

    Zeke Hausfather says:
    May 23, 2013 at 3:53 pm

    You could also just use this…

    http://www.woodfortrees.org/

    Woods for trees has for some unexplained reason dropped UAH global temperature anomaly.

    It is interesting that the satellite UAH global temperature anomaly is less than HADCRUT4 and GISS in the 1980s, while HADCRUT4 and GISS are almost 0.4 higher than UAH in 2013.

    It appears the science is not settled in the measurement and the manipulation of planetary temperature.

    https://docs.google.com/spreadsheet/ccc?key=0AnTohu4oFUbcdEgzTkpEYTAwN1BiXzJXMXZ5RVJiOUE#gid=15

  17. Nick Stokes says: May 23, 2013 at 5:12 pm

    I keep the updated data here. It also automatically updates the graphs.

    Nick, why do you chose to use the base period of 1979 – 2000 for the plots on your site, i.e.:

    Temperatures anomalies in the tables are as stated by the providers, with different anomaly bases. They have been converted to the same base (1979-2000) for plotting.

    http://moyhu.blogspot.com.au/p/latest-ice-and-temperature-data.html

  18. Nick Stokes says:

    JTF,
    I start in 1979 because that is when satellites start. I actually started collecting the data in 2010, so couldn’t use that decade, though I maybe should have gone to 2009.

    But it’s just a matter of subtracting off an average – a single number. It doesn’t matter very much which period, as long as it’s the same for each.. My main concern was to have them all on the same base.

  19. JJ says:

    Your numbering convention (2000.000 = Dec 1999) is different than most similar implementations. Not wrong, but not standard and counterintuitive.

    Your slope appears to be a two point calculation. Much less useful than the standard method of calcing slope of temp series, which would be the slope of the linear regression of all of the data between the endpoints…

  20. walterdnes says:

    > JJ says: May 23, 2013 at 7:08 pm

    > Your slope appears to be a two point calculation. Much less useful than
    > the standard method of calcing slope of temp series, which would be the
    > slope of the linear regression of all of the data between the endpoints…

    ??? The slope uses the spreadsheet “slope” function, which uses all the data points. E.g. cell M1764 is

    =slope(F1764:F$1960,$A1764:$A$1960)

  21. JMI says: May 23, 2013 at 6:37 pm

    You guys certainly have a good handle on things crowd-wise!
    Though, I’ve also come across some new research that you may find interesting in this regard…
    It’s called “The Theory of Crowd Capital” and you can download it here if you’re interested: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2193115

    Enjoy!

    Interesting. They definitely seem to understand the process and value, i.e.:

    “Crowdsourcing, for example, is well-known as a distributed problem-solving and production model, where problems are broadcast through web-based IS to an unknown group of solvers, in the form of an open call for solutions [7]. In the case of Crowdsourcing, the structuring of heterogeneous knowledge resources is undertaken by dispersed individuals, mediated through web-based IS, “where a huge amount of individual contributions build solid and structured sources of data” [46].”

    “We espouse the potential of collective intelligence – some knowledge that is more accurate when it consists of inputs from a distributed population [33]. However, we further reason that gatekeepers, the monitoring and coordination of the flow and assimilation of data, information, and knowledge, into organizations, through the Crowd Capability process, are a must for Crowd Capital to accrue. Therefore, we have strived to amalgamate the dispersed discourses on these important phenomena among scholars, and outlined a model for inculcating Crowd Capital in any organization.”

    In the current context, their theory is very organizationally focused, i.e.;

    We began our investigation in this work, with the observation that more and more organizations are undertaking activities to engage dispersed populations through IS. In this paper, we present theory that explains this observed phenomenon, as an attempt by organizations to engage the dispersed knowledge of individuals. Furthermore, we theorize that organizations can create a heterogeneous capability stemming from their extant resources, called Crowd Capability that will engage dispersed knowledge through particular structure, content, and processes. And that doing so, will generate a heterogeneous knowledge resource for the organization known as Crowd Capital.

    whereas WUWT is much less structured, more of an organism with multitude of heads and hands.

  22. TRBixler says:

    So we accurately “know” the temperature to .3000 between datasets. While the trends diverge.

  23. Werner Brozek says:

    William Astley says:
    May 23, 2013 at 6:40 pm
    Woods for trees has for some unexplained reason dropped UAH global temperature anomaly.

    ??At the present time, of the 7 sets of data that I tried to give, only UAH and RSS are up to date. See:

    http://www.woodfortrees.org/plot/wti/from:2012.5/plot/gistemp/from:2012.5/plot/uah/from:2012.5/plot/rss/from:2012.5/plot/hadsst2gl/from:2012.5/plot/hadcrut4gl/from:2012.5/plot/hadcrut3gl/from:2012.5

  24. James from Arding says:

    This could be useful. I look forward to seeing how you read and process the data.
    I would like to see the CET data on the same graphs.

    Thanks.

  25. walterdnes says:

    > James from Arding says: May 23, 2013 at 8:25 pm

    > This could be useful. I look forward to seeing how you read and
    > process the data. I would like to see the CET data on the same graphs.

    It’s a regional data set, like USCRN. It goes back to 1659 and has a negative slope from July 1987. Unfortunately, I think it would blow up on Google spreadsheets’ limits to add it to this sheet. First, every data set would have to be extended back to 1659 (even if it’s nulls). Google spreadsheets have cell count limits. Secondly, it would add a lot more cells-with-formulas than a regular series, because the values reported are actual temps rather than anomalies. On my home PC, I have to average the 30 Januarys from 1961-1990, the 30 Februarys,…, through to the 30 Decembers, to create a 1961-1990 normals. Then I need to add another column which has the result of each month minus its normal value; i.e. the anomaly. Then I can actually analyze+plot the anomaly values.

    It would probably work as a stand-alone spreadsheet, but I have my doubts about adding it to the current one.

  26. JJ says:

    walterdnes says:

    ??? The slope uses the spreadsheet “slope” function, which uses all the data points. …

    Excellent!

  27. walterdnes says:

    An idea just occured to me. Rather than doing the plotting in a Google spreadsheet, maybe we could simply have a “data central” spreadsheet with just monthly data from various series in it. WUWT’s task would be to ensure that the data is uptodate.

    It seems that graphs simply do not translate well between Google Docs, Excel, Gnumeric, OpenOffice, etc. Even different versions of Excel have problems passing graphs to each other. But passing numeric data and formulas seems to work flawlessly.

    People could download from “data central” and analyze/plot as they wished on their home PCs. Another possibility is to use linked spreadsheets on Google Docs, that would generate plots off “data central”. This would be a more complex approach, but might solve the problem of trying to crowd every analysis into one spreadsheet.

    Thoughts? Comments?

  28. Nick Stokes says: May 23, 2013 at 6:52 pm

    I start in 1979 because that is when satellites start. I actually started collecting the data in 2010, so couldn’t use that decade, though I maybe should have gone to 2009.

    John Christy noted the same about RSS last year, i.e. “RSS only uses 1979-1998 (20 years) while UAH uses the WMO standard of 1981-2010.”

    http://wattsupwiththat.com/2012/11/10/a-big-picture-look-at-earths-temperature-extreme-weather-update/

    But it’s just a matter of subtracting off an average – a single number. It doesn’t matter very much which period, as long as it’s the same for each..

    My main concern was to have them all on the same base

    Understood, but having so many plots out there with so different base periods could cause confusion and, depending on the type of analysis, using brief and arbitrary base periods can lead to biases, e.g., ANOMALY CONSTRUCTION IN CLIMATE DATA: ISSUES AND CHALLENGES, KAWALE et al.:

    “In this paper, we evaluate different measures for base computation and show how an arbitrary choice of base can skew the results and lead to a favorable outcome which might not necessarily be true. We perform a detailed study of different base selection criterion and base periods to highlight that the outcome of data mining can be sensitive to choice of the base. We present a case study of the dipole in the Sahel region to highlight the bias creeping into the results due to the choice of the base. Finally, we propose a generalized model for base selection which uses Monte-Carlo based methods to minimize the expected variance in the anomaly time-series of the underlying datasets. Our research can be instructive for climate scientists and researchers in temporal domain to enable them to choose the right base which would not bias the outcome of the results.”

    “We further show the bias in results introduced due to a choosing a short reference interval and show the difference in conclusions and results using a case study of the Sahel dipole. It is important to handle the bias introduced due to a short base as subsequent conclusions derived from it get affected. We further propose a generalized algorithm to handle the the issue of a bias-free base.

    Interestingly, they conclude that;

    “Using our algorithm, we get the optimal base period to be 55 years.
    “http://climatechange.cs.umn.edu/docs/sahel.pdf

    Their logic seems sounder than Hansen’s, i.e.:

    “We calculate seasonal-mean temperature anomalies relative to average temperature in the base period 1951-1980. This is an appropriate base period because global temperature was relatively stable and still within the Holocene range to which humanity and other planetary life are adapted (note 1).”

    “1 In contrast, we infer that current global temperature is above the Holocene range, as evidenced by the fact that the ice sheets in both hemispheres are now rapidly shedding mass (Rignot et al., 2011) and sea level is rising (Nerem et al., 2006) at a rate (more than 3 mm/year or 3 m/millennium) that is much higher than the rate of sea level change during the past several millennia”

    http://www.giss.nasa.gov/research/briefs/hansen_17/blockquote>

  29. Steve McIntyre says:

    If anyone is interested in doing statistical analysis, I recommend that you take the few minutes to learn R. You can retrieve the data and do the analysis with simple scripts. An extra advantage is that third parties can readily verify your scripts.

  30. walterdnes says: May 23, 2013 at 9:31 pm

    An idea just occured to me. Rather than doing the plotting in a Google spreadsheet, maybe we could simply have a “data central” spreadsheet with just monthly data from various series in it. WUWT’s task would be to ensure that the data is uptodate.

    It seems that graphs simply do not translate well between Google Docs, Excel, Gnumeric, OpenOffice, etc. Even different versions of Excel have problems passing graphs to each other. But passing numeric data and formulas seems to work flawlessly.

    People could download from “data central” and analyze/plot as they wished on their home PCs. Another possibility is to use linked spreadsheets on Google Docs, that would generate plots off “data central”. This would be a more complex approach, but might solve the problem of trying to crowd every analysis into one spreadsheet.

    Thoughts? Comments?

    I like your thought to allow people to “roll their own”. Eventually we will want to develop something more involved, with advanced plotting and analysis capabilities, but a good start is to make all of the data readily accessible in a standardized format. I would try to minimize the complexity of the spreadsheet and maximize the number of data sources we make available. Once we have sufficient data sources readily available, hopefully some enterprising minds will figure out how to provide an analysis and reporting capability like WFTs “analyse time series tool”, which is freeware and downloadable here:

    http://www.woodfortrees.org/software

  31. Ivan says:

    this is maybe off-topic but I am curious: what happened to Watts et al 2012 paper released last July? Why waste our time with the data most of which is obvious garbage produced by successive series of massaging, adjusting and readjusting in order to increase the trend?

  32. walterdnes says:

    justthefactswuwt says: May 23, 2013 at 9:55 pm

    > Eventually we will want to develop something more involved,
    > with advanced plotting and analysis capabilities

    I like Steve McIntyre’s suggestion about using R. It seems to be scriptable, runnable from a command line if you wish, and available free for Windows/Apple/Linux/etc. And there’s a large community behind it.

    That’s it for tonight. It’s after 1:00 AM here in Toronto. I’ll be back in a few hours.

  33. Steve McIntyre

    Hello, I sent you an email on a tangental subject. Please check your mailbox, as the email is long and thus might end up in your spam folder.

  34. Nick Stokes says:

    justthefactswuwt says: May 23, 2013 at 9:33 pm
    “Christy noted the same about RSS last year,…”

    Yes, but in 2010 when I started, both RSS and UAH were using 1979-1998. I’m reminded that that was the main reason for my choice.

    “using brief and arbitrary base periods can lead to biases …”
    This mainly comes in where there are missing data, and you are compiling an average. When you subtract a mean, it has uncertainty, which adds to the overall uncertainty. But if there is no missing data, that is just an overall offset, and the zero point is accepted to be arbitrary. The biases come in making an average which shifts when component data series drop out.

    But here I am not making an average, just plotting. And there isn’t missing data, except where the series start.

  35. James from Arding says:

    Thanks for your comments walterndnes.

    Just FYI I pulled your spreadsheet into Excel 2013 and it worked fine, graphs & all. A few twitches and I had years on the X labels :-). I have a subscription to Office 365 and excel works very nicely in a web browser on Skydrive! Not suggesting everyone should do that.

    I understand some of the differences with local data sets and I appreciate your efforts in trying to get a simple method for anyone to get access to these data and display them in meaningful ways.

    I am interested in somehow trying to automate the data downloading and processing of Australian met data. I would be very interested in comparing over time the official forecasts with the actual data.

    Steve McIntyre: I have started down the route using R but haven’t had enough time to get very far yet. It would be nice if there were some example scripts somewhere – (thinks… I guess I should go look for some :-). It does seem to be the best tool in the long run for this.

  36. A Crooks says:

    Incidentally, I like your 12 month running mean graph. If you just concentrate on one data set – say the purple – it shows up very clearly the 7.5 year cycle of small tough followed by much deeper trough that sits under the broad rising trend.
    deep troughs at 1986, 1993.5, 2001, 2008.5

    Next one will be at 2016
    Cheers

  37. Greg Goodman says:

    Steve McIntyre says: If anyone is interested in doing statistical analysis, I recommend that you take the few minutes to learn R.

    LOL. I have a lot of respect for your competence, tenacity and integrity but “a few minutes” ? You jest!

    I think Willis’ description of a learning curve so steep it gives you nose bleeds would be more realistic.

    R language is so cryptic and arcane that just working out the required syntax to do a linear regression is like decrypting a fragment of the Dead Sea scrolls.

    Maybe to someone with a statistics/econometrics background it would be more accessible but to my computing/engineering/science background it is a cryptic nightmare.

    To other, YMMV, of course.

    I use gnuplot for similar reasons that Steve suggests using R. It still requires considerably more that “a few minutes” to get to fitting and plotting complicated graphs and does not include the advanced statistical features of R.

  38. Greg Goodman says:

    Wlater: Like it or not, one of the major arguments right now is “no global warming for X years”. The CAGW crowd has drawn the line at 17 years as proof that global warming has stopped. That’s why people are fixated on how far back we can go with a negative slope.

    In what way is a rate of change of zero less clear than trying to guess by eye how flat a wiggly line is?

    “Again, we’re debating the CAGW crowd. The “annual temperature anomaly” is what the public is fixated on. This is the 12 month running mean from January through December. My plot includes all 12 points in the year. Can you persuade GISS etal to use the result of a triple running mean filter, rather than a straight 12 month average?”

    The public would not know what an “anomaly” is except that is sounds bad ‘cos it’s abnormal.
    The anomaly is not ” the 12 month running mean from January through December” so I guess you don’t understand what it is either.

    An average and a running average is not the same thing. You are so far from understanding what you are looking at and more interested in rebutting ideas than learning , I’ll just wish you good luck and not waste my time.

    JTF had been receptive to suggestions previously, that was why I posted.

  39. Paul Evans says:

    It woud nice to see JMA on there as well

  40. Richard M says:

    If the charts are going to be placed in a reference page I think a nice addition would be to add a line representing raw numbers, no adjustments whatsoever, if it exists. It would be nice to have one place to point to when trying to explain the affect of all the adjustments.

  41. Nick Stokes says: May 23, 2013 at 10:35 pm

    Yes, but in 2010 when I started, both RSS and UAH were using 1979-1998. I’m reminded that that was the main reason for my choice.

    Yep, it wasn’t until January 3rd, 2011 that the “NEW 30-YEAR BASE PERIOD IMPLEMENTED!” for UAH:

    http://www.drroyspencer.com/2011/01/dec-2010-uah-global-temperature-update-0-18-deg-c/

    “using brief and arbitrary base periods can lead to biases …”
    This mainly comes in where there are missing data, and you are compiling an average. When you subtract a mean, it has uncertainty, which adds to the overall uncertainty. But if there is no missing data, that is just an overall offset, and the zero point is accepted to be arbitrary. The biases come in making an average which shifts when component data series drop out.

    But here I am not making an average, just plotting. And there isn’t missing data, except where the series start.

    Understood, though there is still merit in using the WMO standard, as it helps present the anomalies in a consistent manner, eliminating the potential for confusion when comparing anomaly magnitudes across different provider’s plots.

    However, the larger concern in the context of Walter’s spreadsheet is assuring that the difference in base periods is readily apparent to those leveraging the data and comparing across the data sets:

    Choosing a Base Period

    Another reason that the three records differ relates to the “base period” that each group uses to calculate global temperature changes. It is not possible to calculate absolute global average surface temperatures for the GISS analysis because weather stations aren’t spread evenly enough across the globe to offer meaningful measurements. Scientists instead calculate a relative measure called a “temperature anomaly” to track whether global temperatures are changing.

    To calculate temperature anomalies scientists compare average temperatures over any given time period — a month or year, for example — to a long-term average, or base period. The base period serves as a point of reference against which climate change can be tracked.

    All three groups use this same approach, but they do not all use the same base period. GISS uses a base period of 1951 to 1980. The Met Office uses 1961 to 1990. And NCDC uses the entire 20th century. Average temperatures during the GISS and NCDC base periods are about the same, but the base period the Met Office uses is slightly warmer than the period the other two groups use.

    This means that numerical values of the temperature anomalies differ for the three analyses. However, the choice of base period should have no effect on the ranking of different years or on the magnitude of global warming over the past century.

    http://www.nasa.gov/topics/earth/features/2010-climate-records.html

    Eventually we may want to offer a version of the spreadsheet, or a tab within it, with all of the data sets normalized to a 1981 to 2010 base period.

  42. Richard M says: May 24, 2013 at 6:26 am

    If the charts are going to be placed in a reference page I think a nice addition would be to add a line representing raw numbers, no adjustments whatsoever, if it exists. It would be nice to have one place to point to when trying to explain the affect of all the adjustments.

    That’s actually what I emailed Steve on, as I have a draft post forthcoming this weekend on the adjustments made to ICOADS to develop HadSST3, HADISST and ERSST.v3b.

    Bob Tisdale produced this helpful plot:

    Bob Tisdale – bobtisdale.wordpress.com – Click the pic to view at source

    Buckle up for The Bucket Model…

  43. Eric Ellison says:

    Walter

    Thanks for this contribution! I am a non-programmer, non-mathematician fascinated by this venue of great contributors. I spend about 4 hours here on WUWT every day! I am a ham radio operator and for quite a number of years been contributing, and following an amateur effort in developing software defined radio, (SDR) both hardware and software. It is an ongoing worldwide contributor effort.

    http://openhpsdr.org/

    It would not have been possible, or at the least not easy without (crowdsourced) versioning software, in this case Tortoise SVN.

    http://tortoisesvn.net/features.html

    Obviously your ‘crowd sourcing’ is programming related and there are many ‘candidates’: R, C, Java to name a few, each could be handled as ‘branches’ in SVN supported here on WUWT, along with the constantly updated data. The best part is that we non-experienced ‘beta-testers’ can fiddle with the program locally in a Windows (or other OS) environment without committing to any changes.
    On another note, I would gladly donate a couple hundred dollars of my Social Security check towards providing a web based license for a program like Mathematica for example. There are probably many other expensive statistical, graphing, or simulation tools which would qualify. That way Steve, Willis, Anthony etc, could have access to licenses for software which are beyond personal justification of purchase. I’m sure that many of the visitors to WUWT would contribute to purchase of these tools!
    Eric

  44. Walter Dnes says:

    > Greg Goodman says: May 24, 2013 at 4:09 am

    > In what way is a rate of change of zero less clear
    > than trying to guess by eye how flat a wiggly line is?

    The spreadsheet slope() function is a mathematical construct that has more mathematical theory behind it, and is more respected than a “guess by eye”.

    > The anomaly is not ” the 12 month running mean from January
    > through December” so I guess you don’t understand what it is either.

    The point I was trying to make is that once every 12 months, the 12 month running mean of the monthly anomaly coincides with the annual anomaly that GISS/Hadley etc splash in their news releases. It is what it is. I’m not attributing any magical properties to it.

  45. Janice Moore says:

    THANK YOU, Walter Dnes and Just the Facts!

    WOW. Your generosity and hard work (and conscientious accuracy) deserve far more praise than you have received, so far. Not to say that all the constructive criticism is not helpful and good, but, more often than not, it is offered with little or no accompanying warmth and collegiality. You deserve better.

    I cannot help at ALL, but am so grateful to you for providing this highly useful product.

    Applause! Applause! Applause! (even if only from the peanut section — #[:)])

  46. E.M.Smith says:

    Um, I’ve not gone through the spread sheet yet, but just an “FYI” kind of note:

    What is the “GISS data” changes from run to run. GHCN has updates that “trickle in” over time. So you may well find that “this month” has a run where, say, Bahamas from 3 months back is ‘missing’ so gets ‘in filled’ with created stuff. Then a couple of months later that datum show up, and the result changes…

    In addition to that “constantly mutating” most recent “few months”… every so often folks go back and “adjust” (fudge?), fill in, recreate, or just ‘discover’ new data items in the past. Sometimes the far distant past…. So the GHCN is a ‘moving river’ and “You can never cross the same river twice”…

    Then there are the major revisions that result in new versions. V1 vs V2 vs V3. These, too, “change”, though they change many more data items.

    In each case, since each run of GIStemp is “dynamic” in terms of what it ‘fills in’, and thus what it averages, the entire body of “GIStemp data” changes. No two runs are the same as the input changes…. the past is a ‘moving river’ too…

    http://chiefio.wordpress.com/2012/06/20/summary-report-on-v1-vs-v3-ghcn/

    The particular stations use to create infill values depends on what stations data are available. It chooses via a ‘closest and longest’ method. So as the distance of stations with missing data and the length of those segments changes, the stations used for ‘in fill’ changes so the in fill values change so the end product changes…. There is a selection that says ~”must have FOO years of data” so a station that does NOTHING in one run, being one month short of the test, can suddenly become THE most used for adjusting / infilling another; once it gets one more month of data…

    No, I’ve not measured how much this impacts the results. I can only state that it happens. (Or, rather, happened… GIStemp no longer is publishing their current software: http://chiefio.wordpress.com/2013/05/19/gistemp-no-new-source-code/ so what GIStemp is doing today is an unknown / black box.)

    The upshot of all this is that for GISS (and potentially to some extent for anyone using GHCN with similar shared infill methods) their entire historical “data” set slowly changes and mutates. With every single month of temperature data updates.

    So “good luck” figuring out what “data” to load for them, or getting any sort of “historical trend”, since the historical trend this year is quite quite different from what it was 2 years ago or a decade or so back (even if the ‘historical period’ was, say 1880-1970… Yes, “history” constantly mutates with these codes. Typically to increasing warming “trend”… even in what ought to be static “history”…)

    I don’t have a solution for it. At this point the “raw” data isn’t even raw. Every source is now “adjusted” in various ways, sometimes called “Quality Controlled”… (one QC method fills in any data item that does not conform enough to surrounding ASOS locations with an AVERAGE of those ASOS stations… ignoring that an average always reduces range… and ignoring that ASOS stations are typically at airports that are Tarmac Heaven and grow over time…) So even just figuring out what the temperature measured in, say, Chicago in 1910 might have been, is a challenge…

    Oh, and next month it may change anyway… especially in GIStemp…

Comments are closed.