AI Power Consumption

The channel WelchLabs on YouTube has great content regarding the concepts behind AI. It posted a video on 9/13/2024 discussing how the latest releases of these LLM engines are continuing to demonstrate an outer bound of “efficiency” of AI models when their performance is graphed as a function of the compute power burned to train the model.

At 10:39 in the video, a graph is displayed showing the performance change from GPT-3 to GPT-4 . Next to the two graphs is a legend that reflects an astounding statistic. The GPT-3 model training burned 3,640 PetaFLOP-days of compute resource. The GPT-4 training took an astonishing 200,000 petaflop-days of compute power.

To put that in perspective…

A FLOP is a “FLOating Point operation”, a unit of horsepower consumed by a typical operation performed within a CPU. It’s not a calculation of a complex matrix dot-product of two arrays with 1024 dimensions but it’s not a simple ADD operation to a local register either. A PetaFLOP is one thousand trillion floating point operations.

1 petaflop = 1,000,000,000,000,000

An Intel i9 processor that might be found in a high-end desktop machine used for gaming has 24 cores, 8 running at up to 6 GHz and 16 running at up to 4 GHz. Intel rates that CPU at 1,228 gigaflops or

1,228,000,000,000 operations/second

It would take 1,000,000 / 1,228 or 814 similar servers to provide 1 petaflop of computing power.

That means training of GPT-3 that took 3640 petaflop-days would require 3640 x 814 or 2,962,960 individual desktop computers to match that computing power. For GPT-4 that took 200,000 petaflop-days. That would require 200,000 x 814 or 162,800,000 desktop PC equivalents.

Obviously, this is a bit of an exaggeration. The computers used for this training were equipped with the latest GPU (Graphical Processing Unit) blades that are optimized for matrix mathematical operations. A top of the line consumer grade GPU made by NVIDIA branded the RTX 4090 can perform 3.6 teraflops so the training loads would be

GPT-3 training = 3,640 petaflops = 3,640,000 teraflops / 3.6 = 1,011,111 RTX 4090 GPUs

GPT-4 training = 200,000 petaflops = 200,000,000 teraflops / 3.6 = 55,555,555 RTX 4090 GPUs

NVIDIA’s top data center oriented GPU labeled the A100 is rated at 312 teraflops so the same training loads would equate to these counts of A100 processors (which cost about $23,000 each):

GPT-3 training = 3,640 petaflops = 3,640,000 teraflops / 312 = 11,667 A100 GPUs

GPT-4 training = 200,000 petaflops = 200,000,000 teraflops / 312 = 641,025 A100 GPUs

Note that the original statistics were not in petaflops but petaflop-days. A petaflop-day is the output of compute providing one petaflop of processing operating for an entire day. Here’s the power consumption of each of those hardware examples above:

  • Intel i9 14900K processor = 360 watts
  • NVIDIA RTX 4090 consumer grade GPU card = 450 watts
  • NVIDIA A100 data center grade GPU card = 400 watts

The kilowatt power consumption for each for a full day of operation would be:

  • Intel i9 14900K processor = 0.360 watts x 24 = 8.64 kilowatt hours
  • NVIDIA RTX 4090 consumer grade GPU card = 0.450 watts x 24 = 10.8 kilowatt hours
  • NVIDIA A100 data center grade GPU card = 0.400 watts x 24 = 9.6 kilowatt hours

Mapping those power consumption levels to the processing required for GPT-4 training that totaled 200,000 petaflops is jaw-dropping:

  • Chat GPT-4 training on desktop equivalents = 8.64 kW x 162,800,000 = 1,406,592,000 kilowatt hours
  • Chat GPT-4 training on RTX 4090 GPUs = 10.8 kW x 55,555,555 = 599,999,994 kilowatt hours
  • Chat GPT-4 training on A100 GPUs = 9.6 kW x 641,025 = 6,153,840 kilowatt hours

For comparison, the largest single power plant complex in the United States is Grand Coulee Dam which is rated at a capacity of 7,097 megawatts or 7,097,000 kilowatts. If I haven’t scrambled a units conversion somewhere, the power consumed by the GPT-4 training could have consumed all of the power from Grand Coulee Dam for 52 minutes.

And that’s just the training compute. I haven’t seen a good summary anywhere that explains how the final output of a training model is then scaled out for interactive use. That processing is equally dependent on extremely large matrix mathematics operations so I would presume the computing investment to run the final model is equivalent to that required to train it, especially as millions start using it on a daily basis.

It seems obvious at this point that an accurate cost / benefit analysis of AI has not been attempted. The assumption seems to have been that this compute is just laying around doing SOMETHING, let’s have it do THIS and see if anything interesting results. It’s only when you see announcements of one hundred billion dollars in new equipment being planned that it becomes apparent there are environmental impacts to AI that need to be considered as public policy, not solely as private investment decisions that treat power and water as freely available resources.

WTH

5 Likes

That would be spread across the grid and network. I am assuming. It is fragmented?

If we look at training doctors to see patients we see no money in the treatment of patients. If we ask about productivity levels? If we ask about medical care so workers can have their elderly parents care for while the son or daughter goes back to work? Is medicine worth the spending? Probably.

In other words, the efficiencies of GPT 4 must be discussed.

Productivity gains shallow discussion…

If AI has productivity gains will treating humans with medical care be less productive?

The shallow discussions of productivity gains troubles me.

ie Jonny pushed a few buttons and his presentation to the board was taken care of. He putted around online for two hours and went to sleep for the night. The time was used either way.

3 Likes

Claims of productivity gains are always worthy of suspicion.

Throughout my career, “labor savings” triggered special scrutiny. Any time a budget request was prepared for some new software tool and “labor savings” were cited on the justification form, my bosses would (correctly) say, “OK, show me the names of the people we’ll be laying off since we’re saving money on labor.”

Of course, very few people have the stomach to buy a new tool that costs $1 million dollars, then pick five people on their staff to let go cuz they’re not needed anymore. In many cases, the argument would then be made that the “labor savings” would allow the same team to take on NEW work that can’t fit in current workloads, allowing the company to “do more” with the same staff. Okay, then identify that new work and show me the ROI on that work being done.

Of course, this never happens.

Discussions of AI frequently mention all the wonder drugs that can be synthesized or all of the improved diagnostics that could result. Okay, that allows us to use that knowledge requiring fewer doctors whose expertise took an extra four years of medical school and 3-7 years of residency to develop. But the diagnosis of new diseases and long term impairments isn’t the major labor driver in medicine. The major labor driver in medicine is care delivery.

  • an AI won’t change a dressing on a wound
  • an AI won’t rotate an elderly patient to avoid bed sores
  • an AI won’t change the bedding after a bed-ridden patient has an accident
  • an AI cannot lift a bed-ridden patient out of bed into a wheelchair and load them in a van to take them to a doctor appointment

Also, look at the labor pyramid in medicine. I would guess there is about a 7:1 ratio between non-doctor staff and doctors. The “7” in that ratio definitely do “think work” but it is often directly tied to physical work that requires proximity to a patient and hands-on work. And keep in mind that most doctors are not providing actual “care” 8-10 hours a day. With the exception of surgeons, most practicing doctors are probably spending thirty percent of their day staring at a computer screen documenting information required for payment by insurance companies. Is AI going to eliminate that labor drain?

Will the existance of an AI system eliminate the job of someone at Campbell’s Soup who decided, after a year-long analysis effort, to rename the company Campbell’s (sans Soup). The people in those types of non-quantifiable positions can ALWAYS find something else to focus on and justify retaining their job. The person answering the company’s 800 support line dealing with angry customers doesn’t have the luxury of defining their responsibility.

WTH

5 Likes

Interestingly, when a new cryptocurrency was announced, the person promoting it was asked a question about crypto, and responded with a wandering bit about AI and power consumption.

Steve…owns electric utilities

4 Likes

Another thing to consider is the air conditioning load that will be needed to keep those hundreds or thousands of CPUs and GPUs cool. From what I understand, a CPU operating at a cooler temperature is faster and more efficient. Also, there could be problem with failures, if the temperature gets too hot.

~ ~ ~ ~ ~ ~

You may have heard about this recent announcement…

From the article:
Oracle is designing a gigawatt-scale data center that will be powered by a trio of small modular reactors (SMRs), company Chairman and Chief Technology Officer Larry Ellison told investors this week.

The cloud services giant currently has 162 data centers, live and under construction worldwide. The largest of these is 800 megawatts, and Oracle will soon begin construction of data centers that are more than a gigawatt.

Ellison said the electricity demand driven by artificial intelligence and data centers has become so “crazy,” the company is turning toward next-generation nuclear power. Building permits for the three SMRs have already been secured, Ellison said on Oracle’s 2025 Q1 earnings call.

~ ~ ~ ~ ~ ~ ~

They might be using round numbers, but if there are three nuclear plants producing 1000 MW, that is 333 MW each. That is stretching the definition of a “small modular reactor”. The largest SMRs are 300 MWe, so that would only be 900 MW total for three.

For instance:

Also, Ellison saying “building permits for the SMRs have already been secured” is not exactly accurate. They might have purchased some land, and are intending to do some preliminary ground work, but a full construction license approved by the NRC has certainly not been given. The 300 MW- scale SMRs have not yet received their design certifications, so Oracle cannot start building right now.

_ Pete

3 Likes

Ellison does not have building permits for the three SMRs.

1 Like

That would be for domestic US data centers. If Oracle is going to build this 1 GW facility in some other country, they would still need to get approval from that country’s nuclear regulator. I am not aware of any country having given that approval for 3 SMRs, unless Oracle plans to build in Russia or China, perhaps. Even then, we would still probably know about it.

_ Pete

AI will unleash Global Warming!!! Ban the stoopid thing! :imp:

The Captain :innocent:

1 Like

The global data center liquid cooling market size is predicted to grow at a CAGR of over ~25% from 2023 to 2035…

Data Center Liquid Cooling Segmentation by Product Type

  • Modular Liquid Cooling Units
  • Integrated Rack Liquid Unit
  • Heat Exchangers for Hot Spots
  • Door Units
  • Device-Mounted Liquid

DB2

1 Like

There are some answers to your questions above at Usage/Inference vs Training Costs — Thoughts On Sustainability Goals for the LLM / AI Agents Era | by Dan Smith | Medium

Training costs
“Training GPT-3, which has 175 billion parameters, required 355 years of single-processor computing time and consumed 284,000 kWh of energy[2]. This is the same amount of energy the average US household will require to run for the next 30 years!”

Inference costs
“A ChatGPT-like application with estimated use of 11 million requests/hour produces emissions of 12.8k metric tons of CO2/year, 25 times the carbon emissions for training GPT-3. And that’s just one application, there will be many made available on the marketplace and developed by businesses taking advantage of this new tech. Optimizing inference will be critical to limiting environmental impact and power costs.”

1 Like

Hoooooooooooooly ________.

WTH

Wow.

I mean, after reading this thread and some of the links, it does seem this is the relevant connecting link coupling artificial reality, imperial POWER, and silliness

Not just rabbit holes, but the marvelous atmosphere the white rabbit creates of being LATE to something that MUST be attended to (what we never specifically find out, only the sense of a general mood of mad spoiled despotic demands of “higher-ups”) with signs accompanying this fall of mankind (oops, huwomanity hinting at the possibility that conceivably the future is (shocking news) about to unfold!

Balderdash! Pass the tea. And is it ever time to consider progressive taxation of energy usage…. Goodbye, Dinah!

d fb

1 Like

There is a model literally running on wheels, Tesla EVs! The inference computer in each car. I have no idea what the architecture might look like. What I do know:

  • The inference chip is ARM based and one of ARM’s features is low power consumption
  • There are millions on the road already
  • The same inference chip will be used by Optimus, the humanoid robot.

BTW, if the electricity is generated by solar panels the energy/heat is net zero. Let fossil fuels rest undisturbed in their graves! :imp:

The Captain
loves ARM

Since latest IPO

arm

1 Like

You know better than to say something in just one sentence.

Baffling people with long drawn out crap gets more likes.

I get the feeling no one reads the baffling stuff. LOL

Yeah but we do not need people to show up for as many days in the office. They can drive to the golf course instead.

Maybe we are already an AI simulation? And we’ll never be able to generate enough power to create a true AI.

Do We Live in a Simulation? Chances Are about 50–50 | Scientific American

If so, the simulation would most likely create perceptions of reality on demand rather than simulate all of reality all the time—much like a video game optimized to render only the parts of a scene visible to a player. “Maybe that’s why we can’t travel faster than the speed of light, because if we could, we’d be able to get to another galaxy,” said Nice, the show’s co-host, prompting Tyson to gleefully interrupt. “Before they can program it,” the astrophysicist said,delighting at the thought. “So the programmer put in that limit.”

2 Likes

More importantly, do you own electric utilities where servers, etc, are being built? Read an article a couple months ago talking about DUK, D, and maybe AEP being beneficiaries of AI infrastructure build out.

Like the CA gold rush, the people that got rich weren’t the ones digging the gold but the ones selling the picks and shovels.

I hold both Duke and AEP. Used to hold the parent company of Arizona Public Service, but sold on issue of lack of water possibly retarding growth.

Steve