Berkeley Earth releases new version of the BEST dataset

Berkeley Earth Surface Temperature Project (logo)

Berkeley Earth has just released a new version of the Berkeley Earth dataset, which is more comprehensive than the version released in October 2011, and fixes some bugs in the initial release.  You can access the new dataset here: www.BerkeleyEarth.org/data.
The new dataset includes:

  • Additional data not included in the first release of the dataset (e.g. early data from South America, data through 2011, etc.)
  • TMIN and TMAX (in addition to TAVG)
  • Intermediate versions of the data (including multi-valued, single valued, with and without seasonality removed, with and without quality control)
  • Source data in a common format, as well as links to the original sources

All files are in Text format, but if there is enough interest we can also provide them in Matlab. Steven Mosher has independently put together a R function to import the Berkeley Earth data, which is available here:

http://cran.r-project.org/web/packages/BerkeleyEarth/index.html

In making these data accessible to professional and amateur exploration we hope to encourage further analysis. If you have questions or reflections on this work, please contact, info@berkeleyearth.org. We will attempt to address as many inquiries as possible, and look forward to hearing from you.

Best regards,
Elizabeth Muller
Founder and Executive Director
Berkeley Earth Surface Temperature

www.berkeleyearth.org

Advertisements

61 thoughts on “Berkeley Earth releases new version of the BEST dataset

  1. In making these data accessible to professional and amateur exploration we hope to encourage further analysis. If you have questions or reflections on this work, please contact, info@berkeleyearth.org. We will attempt to address as many inquiries as possible, and look forward to hearing from you.
    Best regards,
    Elizabeth Muller
    Founder and Executive Director
    Berkeley Earth Surface Temperature

    Elizabeth, thank you.
    Anthony, appreciate your venue for providing a vehicle to discuss it.
    John

  2. Bad to have to say this but I want to compare this data with the previous release to make sure no “corrections” were added in.

  3. ok, guys, just uploaded a new version of the package to allow those with limited memory to play.
    i suggest 4GB at least, but I tried to add support for smaller machines. its a beast.
    If you have any questions, just write me.. help I need testers.

  4. Okidoki, so what’s the moral here? First release substandard data with an even worse conclusion, then just release a new “fixed” dataset and call it a day, like oh, they’re not a complete set of incompetent activists by default?

  5. The first release was clearly to be read as a first approach. If it takes ten revisions to iron the bugs out, then that is merely ‘scientific democracy’ at work, plus the contributions of people adding considerable knowledge usually free of charge.

  6. Thanks geoff.
    The orginal release is still there, but I dont think people get the process.
    At the end of the day, all of the source data will be posted. Code taking folks step by step to the final data will all be posted. If I survive all the steps in matlab will be re done in R. But its a huge amount of work and I’m just one guy.

  7. oh, if folks want source to version 1.1 to the package they can just write me. Its been posted to CRAN, in line for building… sometimes that happens in a day or two.

  8. Berkeley Earth releases new version of the BEST dataset
    Now with only half the errors! Still has the same great taste and smell you enjoy!
    Recommended by 4 out of 5 climatologists*!
    (*among climatologists who expressed a preference)

  9. First, my thanks to Steven Mosher for putting out the R code for the import, and to Anthony for the notification.
    Next, Geoff Sherrington says:
    February 17, 2012 at 3:20 pm

    The first release was clearly to be read as a first approach.

    Nope. The first release was clearly a publicity seeking device, devoid of backup, underlying data, or code. It was a shabby thing, designed to get people’s attention but devoid of any scientific value … as can be seen by the fact that no one tried to do any science with it.
    w.

  10. steven mosher says:
    February 17, 2012 at 3:39 pm

    Thanks geoff.
    The orginal release is still there, but I dont think people get the process.
    At the end of the day, all of the source data will be posted.

    Mosh, we understand the process quite well.
    First put out all of your claims, gain all the glory, make the interview circuit, get the tame journalists to do puff pieces, have Mosher explain how it’s all quite logical. Do not include data or code with this one, or it might bite you.
    Second, months later when you have garnered all the laurels, then and only then release the data … without code, of course, but have Mosher give a promise that the data will be followed, “at the end of the day”, by the code.
    Yeah, Mosh, that’s the ticket, keep trying to convince us that’s how science is done—Congressional appearance one day, press release next day, interview next day … and then data in six months, code in a year, science at its finest…
    If I tried that kind of bull here on this website people would crucify me, Mosh, INCLUDING YOURSELF.
    And you actually have the nerve to stand there and tell us we don’t understand the process??
    Oh, we do, Mosh, we do … but it’s clear that someone here doesn’t.
    w.

  11. Thanks Mr. Mosher, Mr. Watts.
    This ain’t going to work on my pesky ZX81! What’s the best way (if any) to edit down the data set to the last n years? [Sorry if stupid question.]

  12. Willis,
    You are just jealous that you have not been made a part of the Muller family. Mosher-to-Muller is not a big name change. You got way too many conflicting letters.

  13. Science is rarely enough. Shall we talk heliocentric vs. geocentric? Cholera? Flat vs round earth? antiseptics? Come on. Be real. Half of you are convinced that global warming is coming; half of you aren’t. Obviously, the science isn’t enough.

  14. While agreeing with the general skepticism about Dr. Muller and his motives, Mr, Mosher (who has a reputation of not being easily satisfied) has worked on it and it IS data. So let’s have a fair look at it.
    PS: Managed to get most data through before the fire started. 🙂

  15. for me the only things that matter are the error bars! Just how wide ar ethey? Have you responded to informed critiques from folks such as Briggs and JeffID?

  16. “Berkeley Earth has just released a new version of the Berkeley Earth dataset, which is more comprehensive than the version released in October 2011, and fixes some bugs in the initial release. ”
    ==================
    Have they deigned to explain why it is more comprehensive, or released the code that fixes the bugs ?
    With bated breath, I await, an explanation.

  17. doug@4:17 – quite the contrary, the ice is coming. How much longer do you think the present inter-glacial will last? Beyond our lifetimes almost certainly, but history has shown us that the ice is much more common than the heat (and that there is more life and diversity of life when it is hotter).

  18. “Yeah, Mosh, that’s the ticket, keep trying to convince us that’s how science is done—Congressional appearance one day, press release next day, interview next day … and then data in six months, code in a year, science at its finest… ”
    But that *is* how science is done, Willis. By some antique sorts of merit anyways. Normally we’re out of sorts to ever see the data or code unless a law about computer access is broken.

  19. “steven mosher says:
    February 17, 2012 at 2:53 pm
    ok, guys, just uploaded a new version of the package to allow those with limited memory to play.
    i suggest 4GB at least, but I tried to add support for smaller machines. its a beast.
    If you have any questions, just write me.. help I need testers”
    Link you swine, LINK!

  20. The promise of a readily accessible data base will not be met until BEST provides access to all verisons of records at any individual user-chosen station, similar to what is done by GISS. Having to download the entire data base in its various versions is a wasteful exercise, inasmuch as only a small fraction of station records carries signal information suitable for scientific purposes..

  21. Climate Watcher says:
    February 17, 2012 at 5:24 pm
    New version? Do we call it ‘Second BEST’?
    _________________________
    We’re all stealing that…

  22. Still looking for an actual calibration of a single gridcell for the purpose of determining actual, physical error limits.
    Not a correlation study. Or a self-referential examination of internal consistency.

  23. Ok, I don’t mean to sound like an idiot, but what exactly does Steven’s package suppose to do besides download the data?

  24. cui bono says:
    February 17, 2012 at 4:08 pm
    Thanks Mr. Mosher, Mr. Watts.
    This ain’t going to work on my pesky ZX81! What’s the best way (if any) to edit down the data set to the last n years? [Sorry if stupid question.]

    =========================================================================
    My Commodore 64 is eagerly awaiting 22nd BEST. It can handle “42.” What are you crying about? ;o)

  25. Best from Berkeley and Mullar already blew any chance I would have any interest in anything they have to say. They have aptly demonstated their bias and their incompetance and have no chance.

  26. Well, folks, GIGO —- this crap just makes me tired.
    This is the same nasty data we have ALWAYS had access to more or less. We have a dodgy series generally of one minimum and one maximum temperature value per day, from a series of stations more or less [dis-]continuously. Too many of these stations have problems with their site, calibration, station moves, instrument changes, adjustments of unknown derivation, missing values, etc etc. but still claim orders of magnitude LESS error/variance than the simple limit of instrument observability.error alone. I have a news flash for all you purported climate “scientists”
    ===>> when the manufacturer of a mercury in glass thermometer says the instrument has a limit of observation of plus or minus point 5 degrees C, you can not get the same “uncertainty” of point 0278 C from poorly or non- calibrated instruments dating from values from 1895 thru 19whatever.
    As far as I recall from basic stats, a sample size of ONE from a population with unknown distribution has infinite / unknown variance and this is the actual standard error of each data point in these sets. It is virtually certain, imo. that the purported point 8 degrees C temperature increase in the last 100 years falls within the signal noise.
    How about we START collecting random samples from random locations with enough sample replication (n) for REAL accuracy and precision, for a long enough period to get ACTUAL RELIABLE DATASETS ? Then climate science will have joined the rest of real science rather than snake-oil-science.

  27. DocWat said:
    February 17, 2012 at 2:34 pm
    Off topic>>
    Go to Fox news and see the video of tornadoes on the surface of the sun…
    ———————————————
    I didn’t know there were trailer parks on the sun!

  28. Climate Watcher: February 17, 2012 at 5:24 pm
    says: “….New version? Do we call it ‘Second BEST’?….”
    Ha ha – that’s good! … and it’ll stick I think!

  29. wmsc.
    Currently the package is focused on just reading in the various files into R friendly data structures.
    Given the large number of files, that’s going to be a big task that requires a bunch of testing. You have to start somewhere. After that, I’m sure people will want to understand every step between the various files. So that will be added to the package. Next, people will want to undestand how the scalpal works, so thats on the table. Of course a matlab version will aslo be made available, And I will continue to emulate the entire system in R. Its a huge task to put this code into a FREE format.
    For additional functionality I’ve coded it to work with my other packages. So you can read in best data and run nick stokes method, or crus method, maybe Gisstemp if I get a break, taminos method, jeffid/Romans method..
    So, baby steps, Now my time is consumed with trying to make this beast fit in small memory.
    I could say screw you all go buy a bigger machine, but I figured that some guys might appreciate the effort, plus I get to learn some new R packages. that makes me happy and peaceful.
    V1.0 is posted, V 1.1 is in the build que, and 1.2 is going to be done tonight

  30. Biobob says; ‘Too many of these stations have problems with their site, calibration, station moves, instrument changes, adjustments of unknown derivation, missing values, etc etc’
    I thought that’s what set the BEST project apart. It’s transparent and provides an open database. Regardless…that stuff is all accounted for is it not? The silence in here is deafening (apart from a lot of diversionary stuff about the solar tornado). It’s so cold in NZ at the moment btw (usually the hottest month-Feb)…it’s playing tricks with my brain!).

  31. At face value this BEST exercise looks like an excellent resource. They have several different categories of data such as “Quality Controlled”, and “Seasonality Removed”.
    To be fair, we sceptics cannot moan about both rogue data (we yell, “hah! Faulty instruments skew the data – GIGO!”) and massaged data (we yell, “They’ve been fiddling the figures! Warm bias! Just give us the pure unadulterated source data!”). This immense dataset would appear to make the whole lot available. I say such transparency is to be welcomed.
    Now, who among us has the stamina to wade through it all and find the fallacies behind the warmist claim that the Arctic is roasting?

  32. Second BEST? Trailer Parks on the Sun?
    Coffee, meet keyboard!! Thanks for the grins, folks!
    That said, I too have been wondering what the BEST status is with regard to peer review. Has first BEST even managed to get thru review yes? How about second BEST? Or are both of them just hanging in the wind?

  33. This is the same nasty data we have ALWAYS had access to more or less. We have a dodgy series generally of one minimum and one maximum temperature value per day
    AFAIK it is exclusively one minimum and one maximum temperature value per day.
    I’d like to see someone compare the BEST min/max derived average to the average derived from hourly measurements available for quite a number of sites.
    This would show the warming bias in the min/max methodology. Similar comparisons show that the min/max methodology gives a warming bias of between 20% and 50% of the claimed warming.
    Such an analysis would hopefully get peoples attention after the publicity and hype surrounding BEST.
    I did try to do this analysis but the hourly data is in thousands of files organized by year and it was beyond my limited downloading skills.

  34. What I would like to know: Where are the “freak trees” in the BEST data? Where does most of the warming come from: Increase in night minimum temperatures in winter? Poorly covered areas? Special type of device? Adjustments?
    I hope the computers are running hot analyzing the data…

  35. Maybe some commentators here have never tried to reconstruct and analyse temperatute/time series. Just for Australia alone, with about 1,00 stations, Simon Torok noted in this 1996 PhD thesis that –
    “Station history documentation was investigated for each long-term temperature station. The files typically contained 250-300 items of correspondence per volume, and each station had one to three volumes. The earliest correspondence in the Station history files was generally from 1908, when the colonial meteorological responsibilities were passed over to the Federal Meteorological Bureau, although earlier information was available from ROs. However, most files commenced correspondence in the 1920s. Station history documentation for WA was very detailed in the early sections of the files (prior to 1930) and inspectors were particularly meticulous.
    A summary of notes made regarding of changes that may have affected the temperature record is given in Appendix Al. The summary focuses on changes in conditions rather than ongoing details. Notes such as site location, painting of screens, changes of observer were recorded but are not listed in the Appendix. Further summarised details relating to temperature measurement may be obtained in hand-written summary form on request from the author. Full documentation is available in NCC. The date of the first correspondence archived at the BoM in Melbourne is listed, however earlier information was often found through other sources, such as ROs and State archives”.
    It should be obvious that there is a huge task to bring data up to date. There are thousands of pages of data. They were never collected in the early days with the knowledge that they would be subjected to this scrutiny for this purpose.
    ………………………
    Here, in a ligh-hearted vein, are a few of the problems that Torok found. Why don’t you insert yourself in the shoes of Steven Mosher or the BEST team and tell us how you’d deal with them, quantitatively?
    Thermometer exposure.
    Hanging under gum tree facing west under galvanised iron verandah against mud or stone walls.
    Thermometers originally placed in screens on a suitable site were, at whim of observer, taken on to a balcony twenty feet above the ground.
    Also brought close to or inside the house, so as to be more convenient for the reading.
    Attached thermometer on the barometer inside the house read by one observer.
    Screen’s exposure and condition.
    Beer case used temporarily
    Stevenson Screen painted cream. Or brown, or green, or silver, or not
    Football found inside screen
    Birds enter screen, to drink from the wet-bulb thermometer well.
    Pumpkins growing all over yard, including beneath screen.
    Cows, goats and other stock gather around screen and other instruments.
    Observer skill and attitude.
    Observers hard to replace as they were sent to gaol or to the war.
    Prisoners make observations in gaol.
    Observer lacking in height asked to make the observations while standing on a box, to avoid parallax error.
    Observer not able to make observations at the correct time due to work commitments, so helpfully estimated the temperatures later in the day.
    Observer described as good, “provided his enthusiasm does not wane on cold, wet, or windy days.”
    Observers known to send in their monthly climate returns early, temporarily forgetting the number of days in a particular month.
    Observer sent an entire month of entries to the BoM before they had been made, in order to have a month’s holiday.
    Site closed by BoM as “the readings were taken by girls.”
    Observations of temperature during a heat wave suspected to be exaggerated, as employees in area paid more when the mercury topped 100°F (37.8°C) .
    During an outback feud, regarding who was to have the privilege of making the climate measurements, the telegraph lines were cut to prevent the efficient provision of observations.
    Other problems.
    • Postal employee caught willfully smashing the thermometers.
    • Irate wife, perhaps tired of being woken every morning as her husband made the 3 am observations, took to the valuable screen with an axe, turning it into a pile of firewood.
    • Eagle destroyed Stevenson Screen by flying into the side of it.
    • Horses known to knock down Stevenson Screens and, more recently, cars and trucks have taken on this task.
    • Termites wreak havoc on wooden screens.
    • A dingo once stole a thermometer which had been read following the slaughter of farm animals (the observer was advised to wash his hands in future).
    • Therrmometer broken on nose of observer’s dog.
    • Laundry torn when it caught on the Stevenson Screen after being hung above it to dry.
    Remediation suggestions can be sent to me, but a civil response should not be assumed.

  36. To add to Geoff Sherrington’s list, there is the perhaps apocryphal story of an Australian shire clerk whose responsibility it was to record daily min max temperatures.on a form and post them to the state met office every month.
    As he was going away for his annual 2 weeks holiday he filled in the form in advance and left it with his friend the postmaster with strict instructions not to post it until the end of the month.
    The postmaster’s assistant saw it and put in the daily mail collection and the form arrived 2 weeks early at the state met office.
    The story doesn’t say whether the data was used or not, but it illustrates the unreliability of data collected in remote locations in the pre-internet era.

  37. Geoff Sherrington says: February 18, 2012 at 4:14 am
    Thanks for that Geoff, nothing is more humorous than the intersection of human intent and life. Some of those are hysterical.

  38. “In making these data accessible to professional and amateur exploration we hope to encourage further analysis. If you have questions or reflections on this work, please contact, info@berkeleyearth.org. We will attempt to address as many inquiries as possible, and look forward to hearing from you.”
    Haven’t answered mine. It has only been a few months though.

  39. Philip Bradley says:
    February 18, 2012 at 1:45 am
    This is the same nasty data we have ALWAYS had access to more or less. We have a dodgy series generally of one minimum and one maximum temperature value per day
    AFAIK it is exclusively one minimum and one maximum temperature value per day.
    ##############
    wrong

  40. Gee, couldn’t the Best team afford at least one competent computer scientist on the project?
    Max and Min are data (of unknown quality, for what its worth) but AVG is NOT data, it is a derived quantity, and its inclusion serves no purpose other than to bloat the database by 50% without adding value. Or was that inclusion for the benefit of “Excel-challenged” climate scientologist.
    In fact, if AVG is computed as (Max + Min)/2, then it isn’t even a value with any physical definition or validity.

  41. Thank you for the explanation Steven, if I have time this week I’ll throw my extra clock cycles your way if that would be useful. I’ll have to figure out what I need for R packages, but hopefully that’s not going to require a lot of thought to get running 🙂

  42. Is there any data on the history of urbanisation at ,and local to, each of the stations providing data?
    A lot of urban expansion happened in the 1976-2000 period.
    Until this historical, urban change, type of data is available to properly account for UHI effects in the data sets, they are pretty much useless for calculating real global temperature trends.

  43. AndyG55 says:
    February 18, 2012 at 12:57 pm
    Is there any data on the history of urbanisation at ,and local to, each of the stations providing data?
    A lot of urban expansion happened in the 1976-2000 period.
    Until this historical, urban change, type of data is available to properly account for UHI effects in the data sets, they are pretty much useless for calculating real global temperature trends.
    #############################
    1. yes there is some historical data both on population and the percentage of land that is urban.
    2. We can also just select stations that have no “built” area surrounding them. There are
    roughly 14,000 such stations. They have no built area, zero to small population.
    Looking at rural only you see that the answer doesnt change much. Its warmer. the real question is why

  44. Philip Bradley says:
    February 18, 2012 at 1:45 am
    This is the same nasty data we have ALWAYS had access to more or less. We have a dodgy series generally of one minimum and one maximum temperature value per day
    AFAIK it is exclusively one minimum and one maximum temperature value per day.
    I’d like to see someone compare the BEST min/max derived average to the average derived from hourly measurements available for quite a number of sites.
    ###############
    there is literature on this. read that before you suggest that someone else do busy work.

  45. steven mosher says:
    February 18, 2012 at 7:31 am
    Philip Bradley says:
    February 18, 2012 at 1:45 am
    AFAIK it is exclusively one minimum and one maximum temperature value per day.
    ##############
    wrong

    How about telling us what is correct and providing us with a link.

  46. Smacks of amateurism if the first issue was substandard. This reminds me of everything in the discussion of the possible AGW – comments and conclusions, including the big ones to introduce carbon taxes and subsidies, are based on premature publication. Everyone is going off half cocked for some reason – is it the glory of saving the world or proving the world does not have to be saved that causes this? As an aging engineer I just want true facts (all facts should be true!) and a good explanation of cause and effect. Everything else is unnecessary and offensive!!. At the moment egos and profile rule. In time all those driven by ego and self serving profile will look like fools.

  47. Until you get past the drudgery of getting the raw numbers in the best form possible, you can use exquisitely sophisticated mathematics to derive little more than a big argument.
    Please, stop the argument and turn your minds to materially assist with data quality validation. Major errors reside in some countries. Please help fix them.

Comments are closed.