Get 5% cash back on your AWS bill from day one.Learn more →
Cloud Capital

Report · 2026

The Economics of AI Infrastructure

Understanding the cost structure, unit economics, and margin impact of AI-enabled software

Get the full report

In this report

↓ Download Report

01

Why AI Breaks Traditional SaaS Economics

Your engineering team ships an AI feature. Adoption is strong. Usage climbs. Then the invoice arrives, and the cost of serving that feature is three or four times what Finance modeled, driven by usage patterns no one forecasted and billed by a vendor Finance hadn’t been tracking closely.

This scenario is playing out at growth-stage SaaS companies with increasing frequency, and it reflects a change in how software costs behave at the unit level.

Traditional SaaS infrastructure costs run 10–20% of revenue. Gross margins sit at 75–85%. The marginal cost of serving one additional user is close to zero. AI changes that equation, and the change is not uniform.

AI-native companies, where inference is core to value delivery, run infrastructure at 40–50% of revenue, with gross margins at 20–40% today and projected paths to 50–65% at maturity. AI-enabled SaaS companies adding features to existing products see compression that varies widely with AI intensity and pricing alignment, typically landing in the 60–80% range. Light AI integration with usage-based pricing approaches traditional SaaS at the high end; heavy AI features absorbed into flat subscription pricing fall toward AI-native territory at the low end.

The Margin Gap — Avg. Gross Margins 2026

Traditional SaaS
75–85%
Near-zero marginal cost per user
AI-Enabled
60–80%
Varies with AI intensity and pricing alignment
AI-Native
20–40%
Projected path to 50–65% at maturity

Source: Cloud Capital Cost of Compute 2026 · Q4 2025 CFO Survey (n=100)

This shift arrived fast. AI workloads already consume 22% of cloud spend for growth-stage technology companies, per Cloud Capital’s Cost of Compute 2026 report. That level of cost exposure typically builds over several planning cycles. For most of these companies, it arrived in one. And 73% of those CFOs expect cloud’s share of revenue to increase over the next twelve months.

22%
of cloud spend is AI
73%
CFOs expect share to rise
2.6×
more likely to report margin decline
11×
margin risk gap without governance

The margin consequences are visible. Organizations where AI exceeds 20% of cloud spend are 2.6× more likely to report gross margin decline than those with moderate exposure. But the impact is far from uniform. In Cloud Capital’s data, the difference between companies absorbing AI costs and those experiencing erosion correlates more strongly with operating systems than with the level of AI investment. There is an 11× difference in margin risk between organizations with and without the right governance in place.

The economics are challenging. They are also demonstrably manageable. Understanding how starts with what you’re actually paying for.

All Cloud Capital data cited in this piece draws on a Q4 2025 survey of 100 senior finance leaders at growth-stage technology companies and published as the Cost of Compute 2026 report.

Why AI Breaks Traditional SaaS Economics

Video · Placeholder

02

What You’re Actually Paying For

Where is the money going?

It's the first question a CFO asks when an AI-related invoice arrives that's materially larger than expected. It's also harder to answer than it should be, because AI costs are distributed across multiple vendors, classified differently depending on function, and driven by variables that don't appear in traditional cloud cost reporting.

Three cost categories make up AI infrastructure spend. The diagram below provides the reference framework. What follows focuses on what the diagram cannot convey: which categories matter most to your situation, where the classification complexity lives, and what to watch for as AI features scale.

01

Training

~0%

R&D OpEx. For companies using pre-trained model APIs, this line is near zero. Relevant only when building or fine-tuning your own models.

02

Inference

80–90%

Billed per token — input and output. The most volatile category. Scales with active users, usage per user, tokens per interaction, and model choice. Often a separate invoice.

03

Infrastructure

10–20%

Vector databases, orchestration layers, observability platforms, and egress. Several can rival the model API bill at scale — and none existed in pre-AI cost reporting.

Training

Most growth-stage SaaS companies calling pre-trained models through an API (OpenAI, Anthropic, Google) have minimal or zero training costs. If that describes your company, the two categories below are where the cost structure discussion becomes relevant.

For companies that do build or fine-tune their own models, training is the cost of that development work: GPU and accelerated compute charges from your cloud provider or a GPU-specialist provider. Training belongs in R&D Operating Expenses and scales with model size, retraining frequency, and experimentation intensity.

Inference

Inference is the cost incurred each time a model processes a request in production. For the vast majority of AI-enabled SaaS companies, this is the largest and most volatile cost category.

Inference is billed by your model provider (OpenAI, Anthropic, etc.) or by your cloud provider through a managed service (AWS Bedrock, Google Vertex AI). Finance teams accustomed to managing a single cloud vendor relationship should note that this often means a separate bill from a separate provider, potentially tracked with less rigor than the established cloud relationship.

The billing unit is the token: a unit of text processed by the model. You pay for both the tokens you send (input) and the tokens the model generates (output). Inference scales with active users, usage per user, tokens per interaction, and model choice. Each of those variables moves independently, and together they determine the most consequential component of AI infrastructure cost.

A critical detail: inference costs can be either COGS or R&D OpEx depending on what the inference is doing. Customer-facing inference in production hits gross margin. Internal inference for development and testing sits below it.

Infrastructure

Infrastructure is the cloud compute, storage, and networking required to support AI features: the servers running your application logic, the databases storing embeddings and vectors, the data pipelines feeding your models, and the networking connecting everything.

Like inference, infrastructure splits between COGS and R&D OpEx based on function. Production workloads serving customers belong in COGS. Staging, development, and QA environments belong in R&D. These costs are more familiar to Finance teams because they behave like traditional cloud costs.

The new wrinkle is that AI features require categories of infrastructure that did not exist in the pre-AI stack. Vector databases store the embeddings that power retrieval. Orchestration layers coordinate multi-step workflows and agent calls. Observability platforms monitor model output quality, trace cost per call, and catch guardrail violations. Egress charges accrue each time data moves between cloud regions or out to model providers.

None of these line items existed in pre-AI cost reporting, and several can rival or exceed the model API bill itself at scale. Finance teams scanning the cloud bill for AI cost surprises often miss them, in part because they are not always tagged as AI workloads in existing cost reporting.

What You're Actually Paying For

Video · Placeholder

03

What Drives the Invoice

The cost structure tells you what you’re paying for. The harder question: what determines how much?

Many variables influence AI infrastructure cost, from training frequency to data pipeline complexity. But for companies running AI features in production, four variables tend to have the most direct and measurable impact on the invoice. Some are engineering decisions. Some are rate decisions analogous to vendor negotiations. Some are commercial decisions about how the product is packaged and priced. What they share: each one moves cost per interaction, cost per user, or both, and none sit neatly within a single function's control.

Usage Intensity

1× / day
$0.60/ mo
10× / day
$6.00/ mo
Same model,
same feature
10×
higher cost

How often do users trigger the model, and how deeply is AI embedded in the product?

A product with a single AI-powered search feature generates fewer model calls per user than a product where AI is woven into every workflow. A customer who uses an AI feature ten times a day costs ten times more to serve than one who uses it once.

But usage intensity is not entirely outside Finance's influence. How many interactions a user can initiate per day, whether compute-intensive features are gated behind pricing that reflects their cost to serve, whether there's a ceiling on output volume: these are product and pricing decisions that control inference volume at the source. An AI feature with uncapped usage and no pricing alignment is an open-ended cost commitment that scales with every user who discovers it.

Model Selection

Specialized model
$0.002
per interaction
10× more
Frontier model
$0.020
per interaction

A 10× cost difference between two models is common. A customer support workflow running on a smaller, specialized model might cost $0.002 per interaction. The same workflow on a frontier model could cost $0.02 or higher. Across millions of interactions, that difference determines whether a feature contributes to margin or erodes it.

How does a choice this consequential get made? Typically by Engineering, based on capability requirements, often with limited visibility into the P&L impact. But the evaluation itself is not inherently technical. Comparing the cost-per-interaction of two models, at a given quality threshold, is a financial analysis. Finance doesn't need to choose the model. Finance does need to be in the room when the cost implications are evaluated.

Token Volume

System prompt4,000
Context2,000
Query1,000
Response1,000
System prompt alone is half the bill before the user types anything

Every AI interaction involves tokens: the text sent to the model (input) and the text the model generates (output). Both are billed. The more context you send and the more the model generates, the higher the cost per interaction.

Long system prompts, verbose instructions, and large context windows quietly increase cost per interaction. A support agent architecture that includes 8,000 tokens of background context with every query might spend four times more on context than on the actual answer the model produces.

Token volume compounds with usage intensity. More users, longer interactions, and higher token counts per interaction produce exponential cost growth.

Why Falling Prices Are Not Falling Costs

Per-token prices have been falling across major model providers. Token consumption per task has been rising faster. Reasoning models generate substantial intermediate output before producing a final answer. Agentic workflows chain multiple model calls per user request. Net cost per task is rising even as unit prices decline, and forecasts that miss this dynamic will systematically underestimate cost growth.

Architecture Design

1
request
Safety classifier
Embedding lookup
Primary model call
Eval pass
Retry (if needed)
cost per
user action
3–5×

How many model calls actually happen behind the scenes for each user-facing interaction? The answer is almost always more than one.

A single user action expands into a tree of model calls, not a single cost event. Each call may trigger downstream calls (retries, supervision passes, evaluation steps), and the full cost of the interaction is the sum across that tree. This is where most forecast variance originates. The architecture may include a safety classifier that screens the request, an embedding lookup against a vector database, the primary model call, a quality evaluation pass, and a retry if the first response fails. Each step has a cost. Guardrails, evaluations, fallbacks, and retries can multiply the true cost per user action by 3× to 5× or more.

These costs also compound. A supervision layer that monitors model output triggers its own model call for every message it reviews. Messages that fail a policy check trigger retries, and each retry passes through the supervisor again. One AI-native company found that reducing inter-agent message volume by 50% cut associated costs by 60%, because eliminating messages also eliminated the downstream calls those messages generated.

Rigorous teams get ahead of this by modeling the full cost tree for each AI feature before building it: estimating API calls per flow, scenario-testing model and routing changes, and validating the model against real usage after launch. The formula below is how that cost tree gets built.

What Drives the Invoice

Video · Placeholder

50%

Reduction in inter-agent message volume

Cut associated AI infrastructure costs by 60% — because eliminating messages eliminates the downstream calls they generate. The cost tree compounds invisibly until you measure it.

04

From Tokens to Gross Margin

The four cost drivers above feed a formula that connects token volume and model price to cost per user and gross margin:

Cost Per Interaction
=
Tokens Per Interaction×Cost Per Token
Cost Per User
=
Cost Per Interaction×Interactions Per User
Gross Margin Impact
=
Cost Per User÷Revenue Per User

Interaction cost

Context window size × model price determines per-call cost. Varies by feature, user input, and system prompt size.

User cost

How often users trigger AI features drives total cost per seat. Power users can cost 50–100× more than occasional users.

Margin impact

The ratio that matters. If cost per user exceeds what that user generates in gross profit, the feature is margin-negative.

The formula is straightforward. The inputs are not. “Cost Per Interaction” is the sum of every call in the cost tree that interaction generates, not just the cost of the primary model call. “Tokens Per Interaction” is the total tokens consumed across that tree, and “Cost Per Token” is a blended figure across whatever mix of models each step invokes. The formula tells you where to look first: if costs are escalating faster than usage, the answer is almost always in the first line.

The denominator is a choice. “Interactions Per User” is one valid unit, but not the only one. As AI features mature, the better unit is often work performed (resolutions delivered, tasks completed, agent actions executed) or outcomes produced (hours saved, revenue influenced, headcount avoided). What you measure determines what you can price on, what you can put in board reporting, and ultimately how AI gross margin behaves at scale. Few growth-stage companies have built measurement architecture beyond cost per interaction, and closing that gap is the next maturity step for finance teams that have already built cost visibility.

The $125 User

Consider a SaaS product with $100/month ARPU where users average 50 AI interactions per month. Each interaction consumes roughly 2,000 tokens at $0.01 per 1,000 tokens. That works out to $0.02 per interaction, $1.00 per user per month in inference cost. At 1% of revenue, entirely manageable.

Now change the inputs. A power user runs 500 interactions per month with longer prompts (5,000 tokens each) on a more capable model ($0.05 per 1,000 tokens). The math: $0.25 per interaction × 500 interactions = $125 per user per month. That single user costs more to serve than the revenue they generate.

Normal User$1.001% of ARPU$100 ARPU · 50 interactions · 2k tokens · $0.01/1k
Power User$125125% of ARPU$100 ARPU · 500 interactions · 5k tokens · $0.05/1k

Inputs can also change without anyone doing anything differently. Products that maintain conversation history or user context accumulate larger context windows over time. A user who costs $1.00 per month at onboarding may cost several multiples of that six months later, not because they changed their behavior, but because each interaction now carries the weight of every prior one. One AI-native company found that cost per user grew more than proportionally with tenure until they implemented context summarization, which reduced per-user cost by roughly 50% and returned the growth curve to something closer to linear.

Cost per user is a curve, not a static number. Without active context management, it compounds.

From Tokens to Gross Margin

Video · Placeholder

05

Why the Gross Margin Number Is Often Wrong

The formula above produces a gross margin number Finance can act on. But that number is only as reliable as the COGS figure underneath it, and Cloud Capital’s Q4 2025 survey results suggest many are working with a COGS figure they can’t fully defend.

Fifty-seven percent of CFOs at growth-stage technology companies report being “very confident” in the accuracy of their cloud COGS reporting. Only 39% describe their visibility into what drives cloud spend as “excellent” (clear breakdown by product, feature, and customer). Working the intersection: more than a quarter of CFOs are confident in a COGS number they cannot break down by product, feature, or customer. Among the 57% claiming high COGS confidence, nearly half concede their driver visibility is only “good, not granular or consistent.”

This gap shows up in practice. Only 58% of surveyed companies use cost tagging and allocation by product or customer: the basic instrumentation that makes accurate classification possible. For the remaining 42%, the question of whether a given AI workload is COGS or OpEx is answered by approximation rather than by data.

Tagging is necessary but not sufficient. Tagging gets Finance to the workload level: this cluster, this database, this API key. Defending a COGS number for an AI feature requires going one level deeper, into the cost tree itself. Without visibility into how many calls a single user interaction generates, what model each call hits, and how often retries fire, Finance can attribute cost to a feature in aggregate but cannot explain why that cost moved when it did. The COGS number is structurally hard to defend without that decomposition.

“A 5-point gross margin decline can translate to a 25% valuation decrease at constant multiples.”

Todd Gardner · Managing Director, SaaSonomics

The stakes show up in board and investor discounts. As Todd Gardner, managing director at SaaSonomics, wrote in a recent analysis of the survey data: a 5-point gross margin decline can translate to a 25% valuation decrease at constant multiples. Cloud costs are harder to cut than other operating expenses without immediate product consequences, and investors scrutinize AI-related margin changes accordingly. A 5-point gross margin decline driven by misclassification looks identical to a 5-point decline driven by real cost escalation. Both affect valuation the same way.

The Functional Test

AI infrastructure costs follow the same functional classification test as any other cloud cost: does the workload exist to deliver the product to customers? If yes, it qualifies as COGS. If no, it qualifies as OpEx. The test applies by workload function, not by resource type or model sophistication. The table below covers the common cases.

Cost Bucket
COGS
OpEx (R&D)
Notes
Training
None
Always OpEx (R&D)
Includes pre-release fine-tuning and experimentation
Inference: customer-facing
Customer-facing model calls in production
None
Scales with active users and requests
Inference: internal
None
Development tools, testing, employee-facing applications
Classification follows function, not model type
Post-deployment fine-tuning
Runs improving the live production model
Exploratory runs for future model versions
Evaluate by function
Model monitoring and evaluation
Contractually required for customer delivery
Internal quality assurance
Evaluate by function
Infrastructure
Production workloads
Staging, development, QA
Apply allocation where workloads are mixed

For workloads that fall outside these scenarios, the Cloud Infrastructure Accounting Standards (CIAS) framework provides the full methodology, including allocation standards for shared infrastructure and consistency rules that prevent arbitrary period-end reclassifications.

Classification discipline, paired with the tagging infrastructure to support it, is what gives Finance the ability to isolate AI’s true margin impact and defend the gross margin figure in board and investor conversations.

Why the Gross Margin Number Is Often Wrong

Video · Placeholder

42%

of companies lack basic cost tagging by product or customer

Without it, the COGS classification question—is this AI workload a product cost or an R&D expense?—gets answered by approximation, not data. That approximation shows up in the gross margin figure Finance defends to the board.

06

The Operating Model That Works

The organizations managing AI cost growth without outsized margin erosion share a reproducible pattern, and Cloud Capital’s data suggests the pattern compresses under exposure. Among CFOs where AI represents more than 20% of cloud spend, 51% report excellent visibility into cost drivers. Among those with moderate AI exposure (5–20%), only 26% do. The same shape shows up on unit-cost tracking (75% vs. 58% tracking rigorously and using it in decision-making) and on forecast cadence (29% vs. 19% re-forecasting monthly). Visibility into cost drivers is the #1 operating priority among AI-heavy CFOs (53%); at moderate exposure it drops to 30%.

51%
excellent driver visibility — high AI exposure
26%
excellent driver visibility — moderate exposure
75%
track unit costs rigorously — high exposure
53%
cite visibility as #1 priority — high exposure

Read the pattern this way: heavier AI exposure has forced the investment in instrumentation. The CFOs with the most at stake have already built the systems that work, and those systems are reproducible. Three elements show up consistently.

Three Lanes of Ownership

Finance, Engineering, and Product each own a distinct portion of AI unit economics. None of the three can manage it alone.

Finance

Owns the envelope

Spend authority, forecast variance, and gross margin impact. Finance sets the spending envelope and owns the variance explanation to the board. Today only 26% of growth-stage companies have explicit joint ownership of cloud cost between Finance and Engineering — the other 74% have a structural gap somewhere in the operating model.

Spend authority

Forecast variance

Gross margin

COGS classification

Engineering

Owns the drivers

Model selection, token efficiency, retry logic, context management, and architecture decisions. These are the variables that move the invoice, and they sit inside engineering decisions made continuously. Engineering operates within Finance’s envelope and surfaces the specific token, call, and retry patterns driving variance.

Model selection

Token efficiency

Retry logic

Architecture

Product

Owns feature economics

Usage design, gating, caps, and pricing alignment. Whether power-user behavior is bounded by design or absorbed into base pricing has direct margin consequences. Pricing model choice is equally consequential, with each approach carrying different revenue volatility and margin implications.

Usage caps

Pricing model

Feature gating

Power-user design

In practice, the three lanes interact continuously. A feature overruns its cost model. Finance flags the variance. Engineering diagnoses the driver, often a specific change in the cost tree (a model swap, a prompt expansion, a retry loop). Product responds with the appropriate lever: a usage cap, a packaging change, a pricing adjustment. Each lane sees something the other two cannot, and the response only works when all three move together.

The Cadence

Three operating rhythms, each serving a different function:

Continuous

Forecast-vs-actual tracking on cloud and model spend, instrumented at the tagging layer. Variance is visible as it develops, not after it shows up in monthly close.

Monthly

Cost per interaction and cost per user reviewed by feature and segment. This is the operating metric review, not the accounting review. The question is whether unit economics are moving in the right direction and which features are responsible.

Per-Launch

Every new AI feature ships with a cost model built before commitment and validated against real usage after launch. The cost model is an input to the build/no-build decision, not a post-hoc reconciliation.

Feature-Level Costing in Practice

One AI-native company built a spend simulator that models cost per user across every agent and feature in their product. The tool pulls live API pricing and lets the team scenario-test model switches, engineering optimizations, and proposed features before committing to them. When the team evaluated a new inference provider that promised 50× savings, a phased rollout to 1% of users delivered the real answer within 24 hours: 5–10×, not 50×, because the cheaper model made significantly more calls per request. The forecast was wrong. They learned that in a day, at negligible risk.

The Operating Model That Works

Video · Placeholder

Token costs will continue to fall. Model efficiency will continue to improve. The companies that win on AI economics will be the ones that treat cost as something engineered and managed: instrumented at the cost-tree level, owned across Finance, Engineering, and Product, and explained with the same rigor as any other line on the P&L.

Sources

This analysis draws on Cloud Capital’s Cost of Compute 2026 report (Q4 2025 CFO Survey, n=100), the Cloud Infrastructure Accounting Standards (CIAS) framework, the Battery Ventures State of AI Report 2025, and Cloud Capital’s ongoing research into AI infrastructure economics.

Download the full report

Get the complete analysis with all findings, 15+ charts, and best practices from top-performing Finance teams.

Data directly from 100 CFOs
15+ charts & data visualizations
Together with Operators Guild
Thanks for submitting the form.