Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124


Enterprise AI is entering a new phase, one in which the central question is no longer what can be built, but how to get the most out of our investment in AI.
In the latest session of VentureBeat’s AI Impact Tour, Brian Gracely, director of portfolio strategy at Red Hat, described the operational reality of large organizations: the expansion of AI, rising inference costs, and limited visibility into what those investments are actually returning.
It’s “Day 2” time: when pilots give way to production and costs, governance and sustainability become more difficult than building the system in the first place.
“We’ve seen customers say, ‘I have 50,000 Copilot licenses. I don’t really know what people are getting out of it. But I do know that I’m paying for the most expensive computing in the world, because it’s GPUs,'” Gracely said. “‘How am I going to control it?’
For much of the past two years, cost was not the primary concern for organizations evaluating generative AI. The experimental phase gave teams cover to spend freely, and the promise of productivity gains justified aggressive investment, but that dynamic is changing as companies enter their second and third budget cycles with AI. The focus has shifted from “can we build something?” to “Are we getting what we paid for?”
Companies that made big early bets on managed AI services are conducting hard reviews of whether those investments deliver measurable value. The problem isn’t just that GPU computing is expensive. It’s just that many organizations lack the instrumentation to connect spend to results, making it nearly impossible to justify renewals or scale responsibly.
The dominant AI procurement model in recent years has been simple: pay a vendor per token, per seat, or per API call and let someone else manage the infrastructure. This model made sense as a starting point, but is increasingly being questioned by organizations with enough experience to compare alternatives.
Companies that have gone through an AI cycle are starting to rethink that model.
“Instead of being purely a token consumer, how can I start being a token generator?” Gracely said. “Are there use cases and workloads that make sense for me to have more of? Maybe it means running GPUs. It might mean renting GPUs. And then asking, ‘Does this workload need the next-generation model? Are there more capable open models or smaller models that will fit?'”
The decision is not binary. The right answer depends on the workload, organization and risk tolerance involved, but the math gets more complicated as the number of capable open models grows, from DeepSeek to the models now available through cloud marketplaces. Companies now really have real alternatives to the handful of vendors that dominated the landscape two years ago.
Some business leaders argue that blocking infrastructure investments now could mean a significant overpayment in the long term, pointing to Anthropic CEO Dario Amodei’s statement that AI inference costs are falling by about 60% annually.
The emergence of open source models such as DeepSeek and others has significantly expanded the strategic options available to companies willing to invest in the underlying infrastructure in the past three years.
But while costs per token are falling, usage is accelerating at a rate that more than offsets the efficiency gains. It is a version of the Jevons paradox, the economic principle that improvements in resource efficiency tend to increase total consumption rather than decrease it, as lower cost allows for wider adoption.
For business budget planners, this means that decreased unit costs do not translate into decreased total invoices. An organization that triples its use of AI while cutting costs in half still ends up spending more than it did before. The consideration becomes which workloads really require the more capable, more expensive models, and which can be handled well by smaller, cheaper alternatives.
The prescription is not to slow investment in AI, but to build it with flexibility. The organizations that will win are not necessarily those that move the fastest or spend the most; they are the ones who build infrastructures and operating models capable of absorbing the next unexpected development.
“The more you can build some abstractions and give yourself some flexibility, the more you can experiment without increasing costs, but also without jeopardizing your business. That’s just as important as asking yourself if you’re doing best practices right now,” Gracely explained.
But despite how entrenched AI discussions have become in business planning cycles, the hands-on experience most organizations have is still measured in years, not decades.
“It seems like we’ve been doing this forever. We’ve been doing this for three years,” Gracely added. “It’s early and it’s moving very fast. You don’t know what’s coming next. But the characteristics of what’s coming next, you should have an idea of what it’s like.”
For business leaders still calibrating their AI investment strategies, this may be the most useful takeaway: The goal is not to optimize the current cost structure, but to create the organizational and technical flexibility to adapt when, not if, it changes again.