Forbes: Energy Hungry AI Researchers Could Contribute to Global Warming

Guest essay by Eric Worrall

We’ve all heard how bitcoin miners use enough computer electricity power a small country. What is less well known is the prodigious and rapidly growing energy expenditure of big players in the AI arms race.

According to a recent estimate, electric power use by companies on their AI systems is doubling every 3.4 months. Leading AI companies include vociferously green companies like Google, Microsoft and Amazon.

Deep Learning’s Climate Change Problem

Rob Toews Contributor
Jun 17, 2020, 11:54am EDT

Earlier this month, OpenAI announced it had built the biggest AI model in history. This astonishingly large model, known as GPT-3, is an impressive technical achievement. Yet it highlights a troubling and harmful trend in the field of artificial intelligence—one that has not gotten enough mainstream attention.

Modern AI models consume a massive amount of energy, and these energy requirements are growing at a breathtaking rate. In the deep learning era, the computational resources needed to produce a best-in-class AI model has on average doubled every 3.4 months; this translates to a 300,000x increase between 2012 and 2018. GPT-3 is just the latest embodiment of this exponential trajectory.

The bottom line: AI has a meaningful carbon footprint today, and if industry trends continue it will soon become much worse. Unless we are willing to reassess and reform today’s AI research agenda, the field of artificial intelligence could become an antagonist in the fight against climate change in the years ahead.

Why exactly do machine learning models consume so much energy?

The first reason is that the datasets used to train these models continue to balloon in size. In 2018, the BERT model achieved best-in-class NLP performance after it was trained on a dataset of 3 billion words. XLNet outperformed BERT based on a training set of 32 billion words. Shortly thereafter, GPT-2 was trained on a dataset of 40 billion words. Dwarfing all these previous efforts, a weighted dataset of roughly 500 billion words was used to train GPT-3.

Neural networks carry out a lengthy set of mathematical operations (both forward propagation and back propagation) for each piece of data they are fed during training, updating their parameters in complex ways. Larger datasets therefore translate to soaring compute and energy requirements.

Another factor driving AI’s massive energy draw is the extensive experimentation and tuning required to develop a model. Machine learning today remains largely an exercise in trial and error. Practitioners will often build hundreds of versions of a given model during training, experimenting with different neural architectures and hyperparameters before identifying an optimal design.

Read more:

Google and friends could choose to put saving the planet ahead of profits, by pausing their AI expansion programme while they research ways of improving energy efficiency. There is unequivocal evidence drastic energy efficiency improvements are possible; in many respects the human brain outclasses any AI ever built, yet unlike multi-acre artificial monstrosities, the human brain uses less power than a high end desktop PC.

But a company which chose to pause its brute force expansion of AI capability to help the planet would almost certainly cede the prize to their rivals. In the headlong race to build a superhuman artificial intelligence, winner takes all; there is no prize for second place.

0 0 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments
June 17, 2020 10:09 pm

What about all the useless energy expended on climate modeling?

Reply to  noaaprogrammer
June 17, 2020 11:33 pm

Matrix showed us the solution to the power requirements of AI.

Reply to  noaaprogrammer
June 18, 2020 1:31 am

Easy answer, Uncalqubile

Al Miller
Reply to  noaaprogrammer
June 18, 2020 7:17 am

At least the AI researchers are trying to accomplish something useful. Klimate science is a complete and utter waste of scarce human resources, invented as a political ploy to fraudulently make the public believe there is a crisis.

Pat from kerbob
Reply to  noaaprogrammer
June 18, 2020 1:36 pm

AI is comparable to the massive increase in huge power hungry data centers, to support the cloud and all those billions of devices connected to the web.
All those extinction rebellion types are the ones driving the increase in power use and destroying the planet according to themselves.
Introspection not a strong point

June 17, 2020 10:15 pm

The double-edged scalpel of virtue signalling. So, what indulgences will they purchase in order to redeem their corporate shares?

June 17, 2020 10:15 pm

There is research on using nano-scale magnetic vortices called skyrmions that compute without dissipating
energy through the creation and destruction of information.

June 17, 2020 10:52 pm

How about using some of the energy to provide, heating, cooking and electricity for the 100s of millions of the worlds poorest.

June 17, 2020 11:05 pm

So climate science must not make their climate models too good? Or should they go ahead and burn up the fossil fuels to come up with innovative solutions to climate change?

Another Joe
June 17, 2020 11:24 pm

In that context this paper might be of interest:
pdf under

Tackling Climate Change with Machine Learning

I thought the title alone was ironic!

Reply to  Another Joe
June 17, 2020 11:37 pm

Anything which begins with the words “Tackling Climate Change ” is a waste of time. Either pseudo-science or AOC type virtue signalling.

June 17, 2020 11:43 pm

Green pain for thee but not for me.

Chris Hanley
June 17, 2020 11:45 pm

No worries, Google claims to be ‘100% renewable’ and ‘the world’s largest corporate buyer of renewable power, with commitments reaching 2.6 gigawatts (2,600 megawatts) of wind and solar energy’.
comment image

Pat from kerbob
Reply to  Chris Hanley
June 18, 2020 1:38 pm

All lies

Vincent Causey
June 18, 2020 1:15 am

This gets to the heart of the Artificial “Intelligence” conundrum. Neural networks do not “know” what they are presented with. For example, to train one to recognise, say, a dog (no easy matter because there are so many varieties), they have to be shown millions of pictures of dogs. This may seem counter intuitive, but it is necessary because all the network sees is an arrangement of pixels. A labrador may be obvious to a human even when viewed from every possible angle, but for a neural network, unless it “sees” a labrador in a perspective that matches one of all the other is was trained on, it could fail to match.

Nor do the programmers know what criteria the network is using when it does make an identification. About a year ago, a team of AI programmers were perplexed when their neural network mistook a husky for a wolf. Eventually they discovered what it was that the network was using to match the definition of a wolf – it was a snowy background. It confused the husky with a wolf because both the husky, and all the wolves it was trained on, had a snowy background. Bizarrely, most of the features of the dogs were ignored.

What this is leading to is that more and more examples are needed to train these networks. There aren’t even enough images on the internet to provide the numbers, so they create variations of the stock images using software to change angles and positions. I think this is a dead end pathway.

Mike McMillan
Reply to  Eric Worrall
June 18, 2020 2:26 am

comment image

Vincent Causey
Reply to  Eric Worrall
June 18, 2020 8:02 am

Maybe. I am not an expect on it, but am speaking from my experience as an amateur who wrote neural networks in python to recognise 10 objects from an opensource database of training images. It was an eye opener and a bit disappointing with the realisation that “there’s no intelligence there.”

tsk tsk
Reply to  Eric Worrall
June 18, 2020 3:03 pm

Right after fusion.

Ron Long
June 18, 2020 3:17 am

I walked through the world’s largest computer! It was at Camp Adair, just north of Corvallis, Oregon, and it was the computer for NORAD for a huge area, Oregon to Hawaii to Alaska. The building was two stories and a basement, and was about 100 meters on each side. The entire basement was home to a computer, with rows of flashing tubes and banks of transistors, and you could walk up and down aisles inside the computer. Due to the tremendous electricity consumption there was a lot of heat inside and cooling air was circulating at high speeds. The computer was divided into two-thirds, which was Big Memory, and one-third, which was Little Memory and was the back-up computer (this was a military facility and redundancy was important!) All of this drove a large video screen with many desks and telephones for operators, and they monitored this large geographic area for sneak aerial attacks or rockets. That was in 1967. The same computing power is now found in a wrist watch.

June 18, 2020 3:50 am

If I read the Forbes article, it says:

“In a widely discussed 2019 study, a group of researchers led by Emma Strubell estimated that training a single deep learning model can generate up to 626,155 pounds of CO2 emissions—roughly equal to the total lifetime carbon footprint of five cars. As a point of comparison, the average American generates 36,156 pounds of CO2 emissions in a year.

To be sure, this estimate is for a particularly energy-intensive model. Training an average-sized machine learning model today generates far less than 626,155 pounds of carbon output.

At the same time, it is worth keeping in mind that when this analysis was conducted, GPT-2 was the largest model available for study and was treated by the researchers as an upper bound on model size. Just a year later, GPT-2 looks tiny—one hundred times smaller, in fact—compared to its successor.”

So, in the example shown, training the model generated the same CO2 as 17 Americans. OK the new systems are a hundred times bigger, but if the increase in electricity consumption is even vaguely linear then the consumption to train what of these systems (measured in terms of “American Consumers”) is pretty small

Just Jenn
June 18, 2020 4:07 am

“Practitioners will often build hundreds of versions of a given model during training, experimenting with different neural architectures and hyperparameters before identifying an optimal design.”


A bad thing would be a model that is NOT TESTED, I can think of some fantastic examples…I’ll bet we all can.

Seriously, this is their argument against someone using an AI program? Weak. Very weak. What are they really afraid of here? That someone will mine their “data” and use AI to analyze it only to find out it’s total bunk? Too late…..the human brain already has.

Tom in Florida
June 18, 2020 4:20 am

500 billion words and the only important ones are:
“I’m sorry Dave, I’m afraid I can’t do that”

D. J. Hawkins
Reply to  Tom in Florida
June 18, 2020 6:32 am

“Daisy, Daisy, give me your answer do…”

June 18, 2020 8:37 am

It’s actually central banking and funny money that creates a need for honest money.

Reply to  Zoe Phin
June 18, 2020 9:23 am

So any complaint about global warming from cryptocoins, is really an admission they are responsible.

NZ Willy
June 18, 2020 12:22 pm

Processing efficiency is scarcely considered in most development, but was always built into my own work. Once a well-regarded colleague quit and I was assigned to continue his work. One of his jobs took 4 days to run — it identified duplicates — so a quick rewrite reduced its run-time to 2 hours. A typical story, I’d expect.

tsk tsk
Reply to  NZ Willy
June 18, 2020 3:18 pm

The hardware guys outran the software guys for so long they forgot how to be efficient. That’s why it takes a 3 GHz, 6-core machine with 16GB of RAM to run a word processor slightly better than Word 5.1.

Remember, the first Mac had a whopping 128k of RAM and another massive 64k of ROM, and as recently as the mid-90’s 8meg was still a very respectable and useful amount of memory.

Ian Random
Reply to  tsk tsk
June 19, 2020 5:05 am

After learning of Wirth’s Law, everything is Wirthwhile software now.

June 19, 2020 7:25 am

Step 1: Define the problem domain.

The Dark Lord
June 19, 2020 10:45 pm

this is pattern recognition software not AI … there is no AI … its just a buzzword …

Reply to  The Dark Lord
June 20, 2020 12:08 am

Am I the only pedant that wonders which language has a vocabulary of 5E11 words? Perhaps the AI just inherited Max Headroom’s stutter.

Verified by MonsterInsights