Can 'big data' make sense of the big sticky mess of 'climate change'?

Big-Data-Kitty[1]This press release via Eurekalert reads more like an advertisement than it does some serious science. But then,we are dealing with a science that in some cases has lost all sense of seriousness, such as the bonkers claim that “climate change” will start killing off felis catus en masse in just a few years.

Within a mere nine years, global warning could produce temperature spikes so elevated as to generate massive cat mortality? The idea is so ludicrous that I hardly know where to begin.

Source: Geocurrents. Eco-Authoritarian Catastrophism: The Dismal and Deluded Vision of Naomi Oreskes and Erik M. Conway

h/t to Bishop Hill for that one.The abstract of the sales pitch paper they are citing starts out like this:

Global climate change and its impact on human life has become one of our era’s greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story.

As a result, big data–induced progress within climate science has been slower compared with big data’s success in other fields such as biology or advertising. The slow progress has been vexing given that climate science has become one of the most data-rich domains in terms of data volume, velocity, and variety.

Of course they are assuming the climate data is all valid, like so many people assume Mann’s interpretations of tree ring data is valid.

So please excuse me if I think that “big data” analysis might only lead to big ludicrous Oreskian style claims, especially when it is packaged as a sales pitch like this one.


Big_dataNew Rochelle, October 14, 2014 –Big Data analytics are helping to provide answers to many complex problems in science and society, but they have not contributed to a better understanding climate science, despite an abundance of climate data. When it comes to analyzing the climate system, Big Data methods alone are not enough and sound scientific theory must guide data modeling techniques and results interpretation, according to an insightful article in Big Data, the highly innovative, peer-reviewed journal from Mary Ann Liebert, Inc., publishers. The article is available free on the Big Data website.

In “A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science,” James Faghmous, PhD and Vipin Kumar, PhD, The University of Minnesota–Twin Cities, explore the challenges and opportunities for mining large climate datasets and the subtle differences that are needed compared to traditional Big Data methods if accurate conclusions are to be drawn. The authors discuss the importance of combining scientific theory and First Principles with Big Data analytics and use examples from existing research to illustrate their novel approach.

“This paper is a great example of leveraging the abundance of climate data with powerful analytical methods, scientific theory, and solid data engineering to explain and predict important climate change phenomena,” says Big Data Editor-in-Chief Vasant Dhar, Co-Director, Center for Business Analytics, Stern School of Business, New York University.


About the Journal

Big Data , published quarterly in print and online, facilitates and supports the efforts of researchers, analysts, statisticians, business leaders, and policymakers to improve operations, profitability, and communications within their organizations. Spanning a broad array of disciplines focusing on novel big data technologies, policies, and innovations, the Journal brings together the community to address the challenges and discover new breakthroughs and trends living within this information. Complete tables of content and a sample issue may be viewed on the Big Data website.


57 thoughts on “Can 'big data' make sense of the big sticky mess of 'climate change'?

  1. if accurate conclusions are to be drawn.
    define ‘accurate ‘ that is the key has we seen time and again accurate is often defined has getting the ‘right results ‘for what your trying to promote not what is right results given the data and honest review. Within the world of climate ‘science’

  2. Whoa up! Let’s have ‘Big Data’ first tackle something small as a demonstration, perhaps the stock market. Then we’ll let them have a go at climate.

    • I concur HR. In fact let them fund themselves through big data analysis of the stock market, and take a break from raiding my back pocket.

  3. “This paper is a great example of leveraging the abundance of climate data with powerful analytical methods, scientific theory, and solid data engineering to explain and predict important climate change phenomena,”

    This is exactly what I intend on doing with my research on the Urban Heat Island effect, using these very principals and data, big data, of the construct of the surface structures, human built and otherwise.

  4. “If they can get you asking the wrong questions, they don’t have to worry about answers.” ― Thomas Pynchon, Gravity’s Rainbow
    Until the “science” stops blaming everything (yes, even restless leg syndrome) on the magic molecule CO2, there is not much chance of “big data” helping much.

  5. Actually, I think they have plenty to fear from Big Data. For those unaware, think of how many times you find the term “samples” and “estimates”, and “cherry picking” in the course of normal (if there is such a thing) climate dialogue. Traditional analytic methods involve sampling data. So if you were looking for a trend or a trait, and you had records of 100,000,000 people who wore glasses, and you wanted to know how many had blue eyes, you’d randomly pick a subset, maybe 1,000?…or 10,000? and then simply extrapolate. This was normal, because no one (other than NSA) had the compute or time available to analyze all 100mil records.
    Big Data does just that. So there would be no “estimate” or extrapolation. Instead you would know instantly how many blue eyes existed in the entire dataset. No guessing.
    It would also expose the wide variations and divergence of various data sets, which are easily hidden now.
    You still fall victim to the “Ooga in, Chucka out” syndrome, but that becomes a bit more obvious as well.
    The ability to query/analyze ALL instances as opposed to sampling/guessing/dowsing is at least a step in the right direction.

    • “Ooga in, Chucka out” syndrome.
      I’m familiar with the song, I can kinda guess the meaning, is there something deeper?

    • It would be snapshot. “So if you were looking for a trend or a trait…” Looking for a trait, maybe. Using this kind of analysis we might determine that Phoenix has a higher temperature trait than Minneapolis. Trends, not so much. “Big data” is just a set of tools to do what we are already doing with smaller data sets on larger data sets. It won’t help us project trends when we don’t know what the drivers are.

    • Jimmaine has a very good assessment here. I have over 35 years of IT experience with a back ground in math and stats. The relatively recent technology breakthroughs in storing and processing (analyzing) large quantities of data are very real and somewhat exciting. Regarding the climate debate, the ability of the new technologies to process all of the data in a reasonable period of time provides for elimination of much of the sampling bias that has been identified in current models – the same models that have been used to promote the myth and have been proven invalid. What the new technologies do not do is assure that proper analytical techniques are followed. This is still the job and responsibility of the analyst. Accountability is still a requirement.

  6. seeing as the feral millions of moggies in aus kill more wildlife than climate ever has or prob could!
    survive in deserts /snows, snakes scorpions and all the rest of our aussie nasties etc
    if only heat would knock em off
    50C inland in deserts and theyre still breeding up

  7. The problem with AIs, and pure data analysis, is they don’t always produce the result you want.
    There’s a hilarious story from the early days of AI research, that the US Army wanted an AI which could automatically analyse aerial photographs, to spot tanks and other vehicles. So they spent their budget, trained an AI, tested it, got it ready for a demonstration, then – horror – in the demonstration it flopped completely.
    In the Postmortem they realised they had made a horrible mistake – all the training photos of tanks were taken on cloudy days, all the training photos which didn’t contain vehicles were taken on clear days – so the AI had learned to distinguish cloudy and clear days.
    There’s a silver lining to the effort – the story goes that they sold the AI to the weather service.

  8. “…facilitates and supports the efforts of researchers, analysts, statisticians, business leaders, and policymakers to improve operations, *profitability*, and communications within their organizations.
    Follow the money. Just another vulture looking to pick over the carcass.
    “Big Data methods alone are not enough and sound scientific theory must guide data modeling techniques and results interpretation, according to an insightful article in Big Data…”
    Indeed, must leave room for adjustments… Run, do not walk, from any sales pitch disguised as an article and called ” insightful.”

  9. THis sounds more like someone let their feline walk on teh keyboard for a couple of minutes and her is what you get.

  10. Speaking as an engineer who has worked on “big data”, it can pretty much tell you anything you want to hear. Most of the people working around on these projects (generally government) have no background in statistics or science, and the people pushing for big data want it because it justifies large budgets for them. And when you talk to the cranky old analyst, his assessment sounds like, “well, X is a good data source. The rest of them are huge but low quality, and we could do fine with X which is just 1% of our data.”
    If you want to be the gadfly at any big data discussion, start asking them about data provenance. That’s the methodology by which you establish that the sources meant something, all the derivations had a logical meaning, and all the derivations are done in such a way that you can establish the logical chain linking them back to the sources. The whole project really has to be built around it, but it’s an large expense for little visible gain that is impossible to explain to project managers who don’t have the background to understand it.
    What happens in real life (this is from repeated experience on the job) is that you get a DVD with AGENCY A DATA scrawled on it containing a backup file from a database. You spend days loading it into the system using software that does things like truncate fields that don’t fit or that throws out certain characters because they have other meaning. Once loaded, people move right on to making the system perform faster, which generally involves an engineer making changing code with no documentation of their efforts and how it affects the queries you’re running. And no one has any idea why the results are at all meaningful, or even what they mean. “But look at all this data! And the queries that get us more data! It’s super impressive!”

  11. Speaking as an Oracle DBA, I have to point out that bigdata is technically just another usage of statistics, so they are already using it in the climate discussions. The hockey stick was just another (crappy) exercise in using big data.

  12. Big data refers to tying data together from many different resources (not just relational databases) to develop new views, better understanding of trends, find ways to make more money, etc.
    Lets look at how awesome Big data has worked for National Security. Sure, with all of their data they missed identifing people trying to bring exposives on a plane in their sneakers or underwear, or entirely missed the formation of the ISIS organziation that is overrunning Iraq in record time and knocking on the doors of Bahgdad, but…did I mention big data was aweseome.
    With that track record why wouldn’t you apply it to predicting climate, decades and centuries into the future along with it’s impact to biology, geology, zoology, and well everything. It’s a natural fit.

    • One of the most concise and biting posts in a long time.
      Yeah, I’ve done lots of work with “big data”, and I’ve learned that the suits just use this particular new phrase as yet another tool to fool the investors. Yep, “big data” tells us this, invest on our company.
      I have a rapidly decreasing amount of respect for what IT has become. It’s degraded to high level scripts, touch tablets, pretty UX, and intentionally fabricated results. Shame on them all.

      • Well, at least it provides a top-down authoritative way of managing everything and unquestionable outputs from complicated black boxes. And uber precision, accuracy be damned.

  13. My cat says bring on the heat I’ve been cold since the heat was turned off last spring.

  14. Actually climate science as it is today has all the big data it needs. It rides on the back of its new partner: the advertising industry. They have all the data that matters for the new propaganda machine calling itself science.

  15. Big Data technology is not suitable for classic climate modelling. Climate models are extremely compute intensive, but big data is centred on massive data sets (using a huge amount of storage I/O) plus some significant (but not intensive) compute resources. In a climate model you need hundreds or even thousands of CPUs or GPUs. In Big Data you need maybe a hundred or so CPUs with lots of motherboards (servers) with heaps of I/O to disk, RAM and networks. So unless your climate model is radically different from normal models, then Big Data is no help.

  16. Something I heard back in the 80s: “To err is human. If you *really* want to screw things up you need a computer.”

  17. In terms of climate data, Big Data really means Big Pile of Modelled Crap. Computers do not create data. They consume data.

  18. Using any computerized technique ASSUMES that what you are attempting to describe can be reduced to mathematical equations. What if “climate” is like, say, the human mind or the mind of God? what if it CAN’T be reduced to mathematics? Oh I forgot. God doesn’t exist and the universe is a hologram. Stupid me. Yes, you may find patterns that can be quantified, but does that mean that “climate” can be? “Natural variability” doesn’t have to be quantifiable.

  19. “As a result, big data–induced progress within climate science has been slower compared with big data’s success in other fields such as biology…”

    The difference between biology and climate science is that, in biology, we understand what these data mean. Most of “big data” in biology is DNA sequence information, from which we can deduce RNA and protein sequences with a reasonable degree of accuracy. We can then compare, group and predict function, and finally perform experiments to test those predictions.
    Climate science has not reached this level of understanding.

  20. Funny how big climate data struggles to “explain” the 18-year halt in warming. They have to invent new data for that, like the warming they “know” is there in the deep oceans. They can predict all sorts of things that “could” happen. So can a mystic with a Ouija board or some tarot cards.

  21. Leveraging big data! First this is from a business school! Second, for a real theory of science, the hypothesis has to be strong enough to have been suggested with little data. Don’t forget, the “theory” we are ‘stuck with’ in climate science was promulgated at the beginning of all this in the 1980s. Now we have 30 years of Big Fiddled Data that came into being to shore up the original ‘theory’. So the idea is, if we manipulate the sea of ‘data’ we will tease out what we need to support the theory. If it doesn’t look right, we have an arsenal of novel statistical tools to effect it. 97% of climate scientists won’t except that the little data scenario (even squeezed and twisted as it is beyond recognition) has permitted the CO2 control knob to be falsified. Big data won’t help. Its already been bled dry.

  22. “This paper is a great example of leveraging the abundance of climate data with powerful analytical methods, scientific theory, and solid data engineering to explain and predict important climate change phenomena,”

    After billions of Dollars have been spent on climate research, super-computers, etc. the IPCC comes out will embarrassing failed projections. You could quadruple spending and data and you will still be wrong. Climate sensitivity does not care about data.

  23. “Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data.”
    “Little impact”!? It produced ‘settled science’. What more could you want?

  24. “Big data” is only a tool, not an end to itself. The tools are powerful if used responsibly and intelligently. They are only as good or bad as the people applying them. They must be applied by people with a deep understanding of the underlying science and the statistics that are inherent to the methods. Unfortunately, very few climate scientists understand statistics and even fewer people who understand statistics know the first thing about the atmosphere. I fear that the method will be poorly used to “prove” catastrophic global warming and then discredited and discarded. That would be unfortunate because they have proven powerful for making short-range forecasts of extreme events.

  25. Instead of starting with theory-guided data, how about letting more of the statisticians who understand big data process the raw climate data to see what real statistics discovers? Don’t start by massaging the data to fit the theories.

  26. “Big Data” is a buzzword. The platform for it, 98% of the time, is Hadoop. And it’s just that – a data platform – that lets you (whether you’re an internet property, a financial institution, a climate scientist, a manufacturer, or whatever) store, process, and analyze a decent, large, or huge amount of data cheaper and faster. It was invented by Google in about 1999, and after two papers published by them in 2003 and 2004, re-implemented as OSS by Doug Cutting and Mike Cafarella.
    It’s the platform that is the backbone Facebook, Twitter, LinkedIn, Yahoo, and pretty much all good sized web properties you can think of. In the last 5 or 6 years, it is being adopted by lots of enterprises and 3 letter government agencies because of its lower cost and better performance.
    Lots of tools that ran in legacy environments, like SAS, R, etc. now run on top of Hadoop, and lots of statistical analysis are now run on this platform. That isn’t to say it does your stats for you, or anything like that – it just provides a cheaper and faster platform to run large scale data storage and processing on.
    Last, a decent sized cluster has hundreds to thousands of cores, several terabytes of RAM, and room for several petabytes of data, co-located with compute, so is generally a great (in terms of economics and speed) platform to run climate models on – with all of the above mentioned caveats about garbage in/garbage out. Using an execution framework like Spark, they can probably run their models lots faster, on lots more data, for cheaper than what they’re doing today.

      • It can be. Again, it’s just a platform that lets you store and process more data cheaper and faster. For most workloads, it’s the same model/algorithm/whatever, just cheaper and faster. I work with a few companies who went from around a day to sequence a genome in their old environment to between 10 minutes and an hour. On a smallish cluster. Nothing new, but it allows them to do a few orders of magnitudes more of sequences, so that many, many things that wouldn’t have been explored before can be now.
        Because of the economics and hardware architecture generally used in legacy HPC, there are some use cases that weren’t possible with a legacy environment that are now. But mostly think of more questions, not better ones – in a very general sense.
        As with any HW/SW platform, GIGO is always a possibility. Just think of it as a bigger, faster, cheaper mouse trap or tool, depending on your level of cynicism. 🙂

    • Being a big data practitioner and having built several large gigaflop high performance compute clusters in a multi-megawatt data center to perform computational studies on petabytes of data developed from research in x-ray crystallography and nuclear magnetic resonance imaging to screen for candidate compounds suitable for drug development at a big pharma company, I can tell you that when I come home from work and have to explain how my day went, my twenty year old blind arthritic hypertensive hypothyroid weak kidney example of Felis Silvestris Catus will only purr if she (1) has a warm lap to sit in, or (2) is allowed to “listen” while curled up on a floor register. She doesn’t care how big the data is, or how it was sifted through, or what results were returned, unless there is more warmth at the end. Oh wait … she only cares about man-made warming! I might have been harboring a seven pound warmista all this time and never even knew.
      It IS worse than we thought!
      LOL, as I am writing this, John Gleese and Taylor Swift are sitting on Graham Norton’s couch, and John just insulted Taylor about her cat, which happens to be a Scottish Fold, just like the one in the above photo. You can watch it at …

  27. The most prominent and accomplished of the climate catastrophe predicting scientists are very quietly vacating and fading from the scene as the Great Catastrophic Manmade Global Warming racket steadily unravels at an ever faster rate.
    As room is now appearing at the top in the climate catastrophe sensationalist end of the media as the climate big wheels retreat from their previous dire predictions, the third, fourth and fifth rate wannabe “scientists” [ ?? ] with their breakfast cereal packet degrees are rushing in to get their share of the honours and glory as predictors of even more far fetched and extreme climatic futures.
    Catastrophic “future” predictions, always “future” predictions, which finally guarantee them a level of prominence and public exposure in the lowest intellectual strata of the sensationalist end of the media.

  28. Big Data sells itself as the magic bullet that will find the signal in of the chaos … its pure BS … its a way to sell hardware and software to gullible managers … remember big data doesn’t measure anything … its just storage … the problem is and will continue to be not enough sensors evenly distributed on the planet … big data can’t solve that …

    • Yes Jeff. You are so right. If we have trouble with forecasting the weather what does this tell us? Warnings are better than no warnings, but sometimes an extreme weather event can pop up so quickly and only effect say a small specific part of a region, it can’t be helped.

Comments are closed.