NewsWorld
PredictionsDigestsScorecardTimelinesArticles
NewsWorld
HomePredictionsDigestsScorecardTimelinesArticlesWorldTechnologyPoliticsBusiness
AI-powered predictive news aggregation© 2026 NewsWorld. All rights reserved.
Trending
TrumpTariffTradeLaunchAnnouncePricesStrikesMajorFebruaryChinaMarketCourtNewsDigestSundayTimelineHongKongServiceMilitaryTechSafetyGlobalOil
TrumpTariffTradeLaunchAnnouncePricesStrikesMajorFebruaryChinaMarketCourtNewsDigestSundayTimelineHongKongServiceMilitaryTechSafetyGlobalOil
All Articles
Running AI models is turning into a memory game
TechCrunch
Published 5 days ago

Running AI models is turning into a memory game

TechCrunch · Feb 17, 2026 · Collected from RSS

Summary

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs -- but memory is an increasingly important part of the picture.

Full Article

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As hyperscalers prepare to build out billions of dollars worth of new data centers, the price for DRAM chips has jumped roughly 7x in the last year. At the same time, there’s a growing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. The companies that master it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business. Semiconductor analyst Dan O’Laughlin has an interesting look at the importance of memory chips on his Substack, where he talks with Val Bercovici, chief AI officer at Weka. They’re both semiconductor guys, so the focus is more on the chips than the broader architecture; the implications for AI software are pretty significant too. I was particularly struck by this passage, in which Bercovici looks at the growing complexity of Anthropic’s prompt-caching documentation: The tell is if we go to Anthropic’s prompt caching pricing page. It started off as a very simple page six or seven months ago, especially as Claude Code was launching — just “use caching, it’s cheaper.” Now it’s an encyclopedia of advice on exactly how many cache writes to pre-buy. You’ve got 5-minute tiers, which are very common across the industry, or 1-hour tiers — and nothing above. That’s a really important tell. Then of course you’ve got all sorts of arbitrage opportunities around the pricing for cache reads based on how many cache writes you’ve pre-purchased. The question here is how long Claude holds your prompt in cached memory: you can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to draw on data that’s still in the cache, so if you manage it right, you can save an awful lot. There is a catch though: every new bit of data you add to the query may bump something else out of the cache window. This is complex stuff, but the upshot is simple enough: Managing memory in AI models is going to be a huge part of AI going forward. Companies that do it well are going to rise to the top. And there is plenty of progress to be made in this new field. Back in October, I covered a startup called TensorMesh that was working on one layer in the stack known as cache-optimization. Techcrunch event Boston, MA | June 23, 2026 Opportunities exist in other parts of the stack. For instance, lower down the stack, there’s the question of how data centers are using the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, although it’s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to structure their model swarms to take advantage of the shared cache. As companies get better at memory orchestration, they’ll use fewer tokens and inference will get cheaper. Meanwhile, models are getting more efficient at processing each token, pushing the cost down still further. As server costs drop, a lot of applications that don’t seem viable now will start to edge into profitability. Russell Brandom has been covering the tech industry since 2012, with a focus on platform policy and emerging technologies. He previously worked at The Verge and Rest of World, and has written for Wired, The Awl and MIT’s Technology Review. He can be reached at russell.brandom@techcrunch.com or on Signal at 412-401-5489. View Bio


Share this story

Read Original at TechCrunch

Related Articles

TechCrunchabout 1 hour ago
China’s brain-computer interface industry is racing ahead

China’s brain-computer interface industry is rapidly scaling from research to commercialization, driven by strong policy support, expanding clinical trials, and growing investor interest.

TechCrunchabout 2 hours ago
6 days left to lock in the lowest TechCrunch Disrupt 2026 rates

Super Early Bird pricing for TechCrunch Disrupt 2026 ends February 27 at 11:59 p.m. PT. That means you have just 6 days left to secure up to $680 of ticket savings.

TechCrunchabout 6 hours ago
The 9,000-pound monster I don’t want to give back

I thought: other than hotels that use SUVs like the Escalade IQL to ferry guests around, what kind of monster chooses a car like this?

TechCrunchabout 8 hours ago
Move over, Apple: Meet the alternative app stores available in the EU and elsewhere

A list of some of the alternative app stores iPhone users in the EU can try today.

TechCrunchabout 19 hours ago
Sam Altman would like remind you that humans use a lot of energy, too

"It also takes a lot of energy to train a human."

TechCrunchabout 21 hours ago
Wikipedia blacklists Archive.today after alleged DDoS attack

Wikipedia editors have decided to remove all links to Archive.today, a web archiving service that they said has been linked to more than 695,000 times across the online encyclopedia.