Forbes: Energy Hungry AI Researchers Could Contribute to Global Warming

Guest essay by Eric Worrall

We’ve all heard how bitcoin miners use enough computer electricity power a small country. What is less well known is the prodigious and rapidly growing energy expenditure of big players in the AI arms race.

According to a recent estimate, electric power use by companies on their AI systems is doubling every 3.4 months. Leading AI companies include vociferously green companies like Google, Microsoft and Amazon.

Deep Learning’s Climate Change Problem

Rob Toews Contributor
Jun 17, 2020, 11:54am EDT

Earlier this month, OpenAI announced it had built the biggest AI model in history. This astonishingly large model, known as GPT-3, is an impressive technical achievement. Yet it highlights a troubling and harmful trend in the field of artificial intelligence—one that has not gotten enough mainstream attention.

Modern AI models consume a massive amount of energy, and these energy requirements are growing at a breathtaking rate. In the deep learning era, the computational resources needed to produce a best-in-class AI model has on average doubled every 3.4 months; this translates to a 300,000x increase between 2012 and 2018. GPT-3 is just the latest embodiment of this exponential trajectory.

The bottom line: AI has a meaningful carbon footprint today, and if industry trends continue it will soon become much worse. Unless we are willing to reassess and reform today’s AI research agenda, the field of artificial intelligence could become an antagonist in the fight against climate change in the years ahead.

Why exactly do machine learning models consume so much energy?

The first reason is that the datasets used to train these models continue to balloon in size. In 2018, the BERT model achieved best-in-class NLP performance after it was trained on a dataset of 3 billion words. XLNet outperformed BERT based on a training set of 32 billion words. Shortly thereafter, GPT-2 was trained on a dataset of 40 billion words. Dwarfing all these previous efforts, a weighted dataset of roughly 500 billion words was used to train GPT-3.

Neural networks carry out a lengthy set of mathematical operations (both forward propagation and back propagation) for each piece of data they are fed during training, updating their parameters in complex ways. Larger datasets therefore translate to soaring compute and energy requirements.

Another factor driving AI’s massive energy draw is the extensive experimentation and tuning required to develop a model. Machine learning today remains largely an exercise in trial and error. Practitioners will often build hundreds of versions of a given model during training, experimenting with different neural architectures and hyperparameters before identifying an optimal design.

Read more:

Google and friends could choose to put saving the planet ahead of profits, by pausing their AI expansion programme while they research ways of improving energy efficiency. There is unequivocal evidence drastic energy efficiency improvements are possible; in many respects the human brain outclasses any AI ever built, yet unlike multi-acre artificial monstrosities, the human brain uses less power than a high end desktop PC.

But a company which chose to pause its brute force expansion of AI capability to help the planet would almost certainly cede the prize to their rivals. In the headlong race to build a superhuman artificial intelligence, winner takes all; there is no prize for second place.

33 thoughts on “Forbes: Energy Hungry AI Researchers Could Contribute to Global Warming

    • noaaprogrammer:
      At least the AI researchers are trying to accomplish something useful. Klimate science is a complete and utter waste of scarce human resources, invented as a political ploy to fraudulently make the public believe there is a crisis.

    • AI is comparable to the massive increase in huge power hungry data centers, to support the cloud and all those billions of devices connected to the web.
      All those extinction rebellion types are the ones driving the increase in power use and destroying the planet according to themselves.
      Introspection not a strong point

  1. The double-edged scalpel of virtue signalling. So, what indulgences will they purchase in order to redeem their corporate shares?

    • Thanks. First learned about reversible computing reading a Ray Kurzweil book, but didn’t know about Skyrmions.

  2. How about using some of the energy to provide, heating, cooking and electricity for the 100s of millions of the worlds poorest.

    • Anything which begins with the words “Tackling Climate Change ” is a waste of time. Either pseudo-science or AOC type virtue signalling.

  3. This gets to the heart of the Artificial “Intelligence” conundrum. Neural networks do not “know” what they are presented with. For example, to train one to recognise, say, a dog (no easy matter because there are so many varieties), they have to be shown millions of pictures of dogs. This may seem counter intuitive, but it is necessary because all the network sees is an arrangement of pixels. A labrador may be obvious to a human even when viewed from every possible angle, but for a neural network, unless it “sees” a labrador in a perspective that matches one of all the other is was trained on, it could fail to match.

    Nor do the programmers know what criteria the network is using when it does make an identification. About a year ago, a team of AI programmers were perplexed when their neural network mistook a husky for a wolf. Eventually they discovered what it was that the network was using to match the definition of a wolf – it was a snowy background. It confused the husky with a wolf because both the husky, and all the wolves it was trained on, had a snowy background. Bizarrely, most of the features of the dogs were ignored.

    What this is leading to is that more and more examples are needed to train these networks. There aren’t even enough images on the internet to provide the numbers, so they create variations of the stock images using software to change angles and positions. I think this is a dead end pathway.

    • I hear you, but Google etc. are making advances. Remember 20 years ago when the first few pages of Google search results were links to porn sites, no matter what search term you used? The figured out how to beat that with AI.

      There are obviously fundamental principles they have overlooked, but they could all be resolved very suddenly – as AI becomes more capable, the improved AI will help speed up research into even better AI, possibly leading to an explosion of capability in a very short time period.

  4. I walked through the world’s largest computer! It was at Camp Adair, just north of Corvallis, Oregon, and it was the computer for NORAD for a huge area, Oregon to Hawaii to Alaska. The building was two stories and a basement, and was about 100 meters on each side. The entire basement was home to a computer, with rows of flashing tubes and banks of transistors, and you could walk up and down aisles inside the computer. Due to the tremendous electricity consumption there was a lot of heat inside and cooling air was circulating at high speeds. The computer was divided into two-thirds, which was Big Memory, and one-third, which was Little Memory and was the back-up computer (this was a military facility and redundancy was important!) All of this drove a large video screen with many desks and telephones for operators, and they monitored this large geographic area for sneak aerial attacks or rockets. That was in 1967. The same computing power is now found in a wrist watch.

  5. If I read the Forbes article, it says:

    “In a widely discussed 2019 study, a group of researchers led by Emma Strubell estimated that training a single deep learning model can generate up to 626,155 pounds of CO2 emissions—roughly equal to the total lifetime carbon footprint of five cars. As a point of comparison, the average American generates 36,156 pounds of CO2 emissions in a year.

    To be sure, this estimate is for a particularly energy-intensive model. Training an average-sized machine learning model today generates far less than 626,155 pounds of carbon output.

    At the same time, it is worth keeping in mind that when this analysis was conducted, GPT-2 was the largest model available for study and was treated by the researchers as an upper bound on model size. Just a year later, GPT-2 looks tiny—one hundred times smaller, in fact—compared to its successor.”

    So, in the example shown, training the model generated the same CO2 as 17 Americans. OK the new systems are a hundred times bigger, but if the increase in electricity consumption is even vaguely linear then the consumption to train what of these systems (measured in terms of “American Consumers”) is pretty small

  6. “Practitioners will often build hundreds of versions of a given model during training, experimenting with different neural architectures and hyperparameters before identifying an optimal design.”


    A bad thing would be a model that is NOT TESTED, I can think of some fantastic examples…I’ll bet we all can.

    Seriously, this is their argument against someone using an AI program? Weak. Very weak. What are they really afraid of here? That someone will mine their “data” and use AI to analyze it only to find out it’s total bunk? Too late…..the human brain already has.

  7. 500 billion words and the only important ones are:
    “I’m sorry Dave, I’m afraid I can’t do that”

  8. Processing efficiency is scarcely considered in most development, but was always built into my own work. Once a well-regarded colleague quit and I was assigned to continue his work. One of his jobs took 4 days to run — it identified duplicates — so a quick rewrite reduced its run-time to 2 hours. A typical story, I’d expect.

    • The hardware guys outran the software guys for so long they forgot how to be efficient. That’s why it takes a 3 GHz, 6-core machine with 16GB of RAM to run a word processor slightly better than Word 5.1.

      Remember, the first Mac had a whopping 128k of RAM and another massive 64k of ROM, and as recently as the mid-90’s 8meg was still a very respectable and useful amount of memory.

  9. this is pattern recognition software not AI … there is no AI … its just a buzzword …

    • Am I the only pedant that wonders which language has a vocabulary of 5E11 words? Perhaps the AI just inherited Max Headroom’s stutter.

Comments are closed.