The Terawatt Time Bomb: Transformers, Trouble, and the Analog In-Memory Compute Fix

How to rescue a frying planet with a novel computer architecture
April 15, 2025

TL;DR
The AI industry faces an imminent electricity crisis with data centers projected to consume 500 terawatt-hours annually by 2027 (equivalent to France's entire consumption). Big Tech's solution of building more power plants is unsustainable. Two promising alternatives are emerging: in-memory computing that reduces energy needs by 10-20x, and distributing AI to edge devices to spread the energy load - each will have a many-fold impact on the problem. Without addressing this power addiction, AI advancements are now constrained not only by data or algorithms, but by electricity availability as well. This is a problem in urgent need of a solution - building higher capacity power plants using fossil-fuels is not it because (and do I really even need to say this!) we are going to fry ourselves. In this post, I'll talk about these new directions and how they will help us bend the energy efficiency curve for AI - and what I am going to do about it. (Yes see below for a personal update!)
PS: Just how urgent a problem this is should become clear by the fact that the topic of this post was on the front page of NYTimes today and also the topic of discussion on China Talk this week.
The Inconvenient Truth About AI's Power Addiction
If you read my post on The Great Data Famine, you know I've been losing sleep over the AI industry's insatiable appetite. But because there is only so much panic to go around per post we focused on running out of internet to feed these digital gluttons in that one. Here we address that oversight by talking about another equally alarming crisis looming on the horizon - I hate to break it to you but we're about to run out of electricity too.
And don't just take my word for it—the numbers are sobering enough to make even the most tech-optimistic VCs choke on their kombucha:
- Gartner predicts that by 2027, a whopping 40% of existing AI data centers will be operationally constrained by power availability. Not broken, not outdated—just unable to find enough juice to keep the lights on.
- Data centers, running those fancy AI-optimized servers, will gobble up 500 terawatt-hours per year by 2027, a staggering 2.6 times what they consumed in 2023. That's roughly equivalent to the entire electricity consumption of France. Oui! You read that right!
- Goldman Sachs Research (not exactly a bunch of tree-hugging alarmists) estimates that data center power demand will surge 160% by 2030, potentially jumping from 1-2% of global power consumption to 3-4%. For context, that growth rate outpaces most entire industrial sectors.
- A single ChatGPT query reportedly requires nearly 10 times as much electricity to process as a Google search. That "quick question" you asked AI about pineapple on pizza? You could have run your fridge for an hour instead (I exaggerate for effect but you get the point).

Source: https://semianalysis.com/2024/03/13/ai-datacenter-energy-dilemma-race/
Big Tech's Power Grab (Literally)
The tech giants aren't just aware of this looming power catastrophe—they're throwing billions at the problem with the frantic energy of a Silicon Valley executive who just realized their mansion's backup generator won't power their home theater during the next rolling blackout.
Consider Meta as a particularly extreme case study in AI's power gluttony. They are building a $10 billion AI data center in Louisiana (their largest to date) that's so power-hungry that it requires Entergy Louisiana to invest another $6 billion in electric infrastructure just to keep it running. We're talking about a 2,250-acre solar farm (that's ~3.5 square miles of panels), three natural gas turbines (so much for that carbon neutrality), and 100 miles of new transmission lines. All this for one company's AI ambitions!
Meta plans to bring around 1GW of compute online in 2025. For perspective, that's roughly equivalent to the power consumption of ~750,000 homes. They're projecting a fleet of more than 1.3 million GPUs by year-end—a small city of silicon all screaming for electricity. Their overall capital expenditure for AI data centers and servers is expected to hit a mind-boggling $60-65 billion.
Sure, they've "committed" to matching their electricity use with 100% clean and renewable energy, but the math doesn't add up. That's why they're hedging with those natural gas turbines.
It's the same story across the industry: Microsoft is resurrecting the ghost of Three Mile Island with a $1.6 billion investment to revive the decommissioned nuclear power plant. Amazon's building four small modular nuclear reactors. Google's nuclear shopping spree is just getting started.
Some companies are even trying to plug directly into power plants, bypassing the grid entirely in their desperation for more electrons. We're witnessing the birth of a new feudal system where tech lords control not just your data and attention, but the very power plants that fuel their digital kingdoms.
How this differs From Other Energy Crisis Narratives
While mainstream tech coverage often portrays this as a simple infrastructure challenge that can be solved with more power plants, my analysis diverges in three critical ways:
- This isn't just about quantity - The centralized nature of AI computation creates dangerous energy bottlenecks that no amount of new power plants can solve - while preserving Earth. Usual caveats around magical technologies like fusion becoming available cheaply and abundantly apply.
- The problem is architectural - Unlike many commentators who focus solely on hardware efficiency, I argue that our fundamental computing architecture (not just the chips) needs reimagining.
- The solution must be distributed - Rather than merely building bigger centralized systems, the sustainable path forward involves pushing AI computation to the edge / client where energy can be sourced locally and used more efficiently.
The Holy Trinity of AI's Energy Nightmare (Now with a Fourth Horseman)
Our current AI energy crisis breaks down into three unholy dimensions, with a fourth now emerging from the shadows:
1. Training: The Energy Black Hole
Training large models has become the computational equivalent of assembling a Death Star—a project so massive it threatens to drain resources from entire systems. Meta's latest LLM training run allegedly required a small city's worth of electricity, all so their model could get marginally better at understanding your aunt's Facebook comments about her cat.
2. Inference: Death By a Thousand Cuts
While individual inference costs look tiny compared to training, scale matters. When billions of people start asking AI to generate images of "cats dressed as Napoleon riding unicorns" billions of times per day, we're looking at energy consumption that could rival entire industrial sectors.
3. Data Center Expansion: The Concrete Carbon Bomb
The AI boom has triggered a global data center construction frenzy that's consuming concrete, steel, and land at unprecedented rates. But there's a double-whammy energy problem hiding in plain sight: first, you need massive power density to run all the centralized compute hardware, and then you ALSO need additional energy to cool the heat these power-hungry machines generate. It's a vicious cycle - more compute requires more power which creates more heat which demands more cooling which consumes even more power! These facilities are essentially fighting the laws of thermodynamics 24/7. As an unapologetically unreformed physicist I can assure you anytime you fight the laws of thermodynamics you lose - I will take that bet 100% of the time!
4. Inference Scaling: The Power-Hungry Perfectionist
Remember when running an AI model just meant generating one answer? Those quaint days are gone. The latest trend in AI is what I call "computational perfectionism"—running models multiple times (you can find out more about this in my post Less Magic, More Math post) with techniques like:
- Parallel Sampling: Generating N different answers and picking the best one (essentially multiplying your power consumption by N)
- Beam Search: Maintaining multiple candidate solutions at each step (again, multiplying your energy bill)
- Revision-Based Approaches: Repeatedly refining answers through multiple iterations (you guessed it—more power consumption)
Each of these approaches can improve AI outputs dramatically, but at the cost of running the same expensive neural network operations many times over. It's like deciding the best way to make dinner is to cook ten different meals and keep only the tastiest one.
Without intervention, we're racing toward a future where our most advanced technology is simultaneously solving humanity's problems and making our planet uninhabitable. Talk about an existential catch-22!
The Desperate Search for Solutions
If you've ever watched the crypto space, you might be experiencing déjà vu. "We're aware of our enormous energy consumption! We're working on it! Pinky swear!" Meanwhile, the meter keeps running and the planet keeps warming.
The responses from major AI labs have been predictable:
- The Optimistic Deflection: "We'll just use renewable energy!" (Ignoring the fact that there isn't nearly enough to go around)
- The Efficiency Promise: "Each new model generation is more efficient!" (While conveniently ignoring that they're making each model exponentially larger and that the usage while inferencing is going through the roof)
- The Carbon Offset Shell Game: "We're carbon neutral via offsets!" (Translation: "We paid someone not to cut down trees they probably weren't going to cut down anyway")
- The Delusional Deflection: "We are going to come up with AGI which will solve everything … eventually" (Where do I even start on this one. You know what, I won't dignify this amount of intellectual laziness with a response except saying FAFO is not a strategy)
Watching the industry's response to this looming crisis has been like watching someone try to put out a forest fire with a water pistol while simultaneously dumping gasoline around the perimeter "just in case."
Case Study: Cerebras's WSE-3 Approach
While most companies are locked in an arms race for more power, some are rethinking the fundamental approach. Cerebras Systems with their Wafer Scale Engine (WSE-3) represents an alternative vision focused on efficiency over brute force.
Their architecture reduces data movement by integrating massive amounts of compute and memory on a single wafer, achieving 4x better energy efficiency than traditional GPU clusters. When training foundation models, their systems reportedly use 35-40% less energy for equivalent computations. This approach doesn't solve all problems, but it demonstrates how architectural innovation can bend the energy curve while still advancing capabilities.
When Bandages Won't Stop the Bleeding: Rethinking AI's Energy Architecture
There are two fundamental shifts that could actually bend the energy consumption curve before we're forced to choose between running AI or keeping the lights on.
The Memory-Compute Paradigm Shift
While many point fingers at the separation of memory and processing in traditional computing (that old von Neumann architecture everyone loves to blame), the real game-changer isn't just about where we do it but also how we do it!
Specifically, the quantum leap in efficiency comes when analog computing and in-memory processing join forces—it's their marriage that creates the energy revolution, not either one alone. Digital in-memory computing? Modest gains. Analog computing without in-memory architecture? Also underwhelming. But combine them both, and suddenly we're talking about 10-20x efficiency improvements!
Here's why: Analog in-memory computing doesn't just shorten the commute between data and processing—it fundamentally changes the nature of the work itself. Instead of converting everything to ones and zeros (which is like translating a poem through three different languages before reading it), analog systems work with continuous electrical values.
The astute observer might object – “But Shwetank what about the noise? What about the variability? Haven’t we seen this movie before?” Some companies like EnCharge are using incredibly clever approaches to solve these issues – see the section after next.
If you defer disbelief though the implications are game-changing: what currently requires a 1GW power plant could potentially run on just 100MW—a far more manageable energy footprint that doesn't require rerouting rivers or reviving nuclear plants. This isn't just trimming around the edges; it's a fundamental redefinition of AI's relationship with our power grid.
Personal Note: Joining the Resistance
Before diving into solutions, I wanted to share some personal news. After spending months chronicling the AI industry's various existential crises (data scarcity, evaluation chaos, synthetic data collapse), I've decided to stop throwing stones from the cheap seats and actually join the resistance.
I'm thrilled (and slightly terrified) to announce that I'm joining EnCharge AI as their Chief Scientist. This isn't a decision I made lightly, but it's one that feels right on multiple levels. Kailash and I have known each other since 2007 when we were both young scientists at IBM, working together on phase change memory and quantum computing. Our paths diverged for a while, but I've always admired his technical vision, depth, and integrity.
I cannot express how excited I am to be joining Kailash, Echere, and Naveen to work on the groundbreaking technology they've created. Their approach to in-memory computing represents exactly the kind of fundamental rethinking our industry needs right now. You can find the announcement here.
While I'll be discussing in-memory computing (and other aspects plaguing AI) in this post and future ones, I want to be transparent that the opinions expressed here are my own. My decision to join EnCharge comes from a conviction that we need to solve AI's energy crisis with better architecture across hardware, algorithms, and software and I believe their approach is among the most promising I've seen. What I cover below is informed by their architecture but broader than it.
EnCharge AI: In-Memory Computing in Action
While Cerebras tackles the scaling challenge with monolithic wafers, EnCharge takes a fundamentally different approach by reimagining how computation and memory interact. Their analog in-memory computing technology achieves an unprecedented 150 TOPS/W for 8-bit compute at the MAC block level, combined with 32GB of optimized high-speed memory—efficiency metrics that dramatically alter AI's energy trajectory and improve upon state-of-the-art by approximately 10-20x.
The key innovation lies in EnCharge's switched-capacitor technology that solves the noise and variability challenges that have plagued previous analog computing attempts. Unlike traditional architectures where data constantly shuttles between separate memory and processing units, EnCharge performs AI calculations directly within memory arrays. By leveraging precise metal-wire capacitors instead of noise-prone transistors, they enable robust, scalable analog computing without sacrificing accuracy or programmability.
What makes this approach particularly promising is its practical implementation path. EnCharge's architecture uses standard CMOS technology compatible with existing foundries, provides a seamless software stack that works with current development environments, and has been validated across five generations of silicon. Their virtualized architecture allows flexible mapping of AI workloads across in-memory compute arrays, making the technology adaptable to diverse applications from edge devices to data centers.
The real-world implications extend far beyond lab efficiency numbers. For power-constrained environments—whether edge devices, automotive systems, or data centers—this means enabling sophisticated AI capabilities that would otherwise be impractical. Imagine running complex inference scaling techniques on laptops without draining batteries in minutes, or supporting real-time intelligent decision-making locally on phones. By bringing computation directly to where data lives, solutions like EnCharge's address AI's fundamental energy bottleneck and enable our second critical shift: distributed intelligence.
Distributing Intelligence to the Edge
The hyper-centralized data center model isn't just inefficient—it's creating energy bottlenecks that grow more problematic as AI adoption accelerates. These massive facilities require extraordinary energy density and peak load planning that typically relies on fossil fuels for reliability. When a data center needs 1GW of consistent power regardless of weather conditions, those natural gas turbines start looking less like a backup and more like a necessity.
Plus, there's the cooling nightmare—concentrated compute means concentrated heat, requiring additional energy just to stop your AI from literally melting itself. It's thermodynamics' revenge at industrial scale.
Energy-efficient in-memory computing architectures enable a smarter approach: distributing AI computation to the edge. By processing data closer to where it's generated—on phones, IoT devices, cars, and local servers—we can naturally spread the energy load across millions of endpoints rather than concentrating it in a handful of power-hungry data centers.
Memory-First Computing: A Path Out of the Energy Crisis
The Analog Revolution: Making AI Sustainable
The memory wall problem isn't news — What's changed though is that inference scaling techniques (remember our "power-hungry perfectionist" from earlier?) are rapidly making this problem more acute.
The most promising approaches to solving this crisis, as demonstrated by companies like EnCharge, share three key characteristics:
- Memory-Centric Computing: Data is stored in memory arrays that can perform calculations directly, eliminating energy-intensive data movement.
- Analog Precision: Rather than converting everything to 1s and 0s, these systems work with analog signals (continuous electrical values), which is vastly more energy-efficient for AI workloads.
- Manufacturing Practicality: Unlike quantum computing or other exotic technologies, the best solutions work within existing semiconductor manufacturing processes, enabling faster adoption.
When you combine these elements, something remarkable happens: the energy cost of each inference can drop by an order of magnitude. This efficiency isn't just environmentally important—it's economically essential as inference scaling techniques like parallel sampling and beam search multiply the computational demands of modern AI.
Why This Revolution Matters Beyond Tech
The implications of memory-first computing extend far beyond technical elegance:
- Making AI Truly Accessible: When AI doesn't require a nuclear reactor's worth of power, it becomes available to regions where reliable electricity is already scarce. We're talking about democratizing access to advanced capabilities, not just making Meta's electricity bill more palatable.
- Privacy by Design: Cloud-based AI isn't just an energy hog—it's a privacy nightmare. Memory-efficient edge computing enables sophisticated AI to run locally, keeping your data under your control instead of floating around in some corporation's data center.
- Innovation Without Asterisks: The most exciting AI applications shouldn't come with the warning: "*May contribute to climate catastrophe." Energy-efficient approaches enable innovation without the existential dread.
The Path Forward: From Talk to Action
The perfect storm is brewing: just as inference scaling techniques become essential for AI quality, memory-first computing is arriving to make these approaches energy-viable. But technology alone isn't enough—we need a coordinated effort:
What You Can Do Today:
- If you're an AI user: Ask about the energy footprint of the AI services you use. Consumer demand for transparency can drive change.
- If you're a developer: Audit your AI applications for energy efficiency. Simple optimizations like batching requests or caching common responses can reduce energy use dramatically.
- If you're an investor: Prioritize companies with sustainable AI approaches in your portfolio. The winners of tomorrow won't just have the best models, but the most efficient ones.
- If you're a policymaker or regulator: Start thinking about energy disclosure requirements for AI systems, similar to emissions standards for vehicles.
- If you're a researcher: Join communities working on energy-efficient AI. The MLPerf Power benchmarks are a good starting point.
The Bottom Line
The AI revolution doesn't have to come at the expense of our planet's future. With memory-first computing enabling sustainable inference scaling, we can have both intelligence and sustainability.
Or, as I like to put it: the smartest AI is the one that doesn't fry the planet while figuring out how to save it.
What do you think about AI's energy consumption problem? Are you concerned about the environmental impact of large AI models? Let me know in the comments below.
As always, my DMs are open if you want to discuss this further, or if you just want to tell me I've gone corporate and sold out. Both are equally welcome.
Subscribe to AI Afterhours and keep up with the latest AI breakthroughs!
https://aiafterhours.substack.com
Originally posted at: