Gross Margin in the Age of AI

Ed Barrow

May 19, 2026

In conversation with Ben Murray of The SaaS CFO, Ed Barrow breaks down what's changing for gross margin, forecasting, and board reporting—and what Finance leaders should be doing before the next budget cycle.

Sections

Report

The Cost of Compute 2026

What 100 CFOs revealed about cloud costs and how it impacts their P&Ls.

Get the report

TLDR

The standard SaaS growth playbook was built on assumptions AI is rewriting. In conversation with Ben Murray of The SaaS CFO, Ed Barrow breaks down what's changing for gross margin, forecasting, and board reporting, and what Finance leaders should be doing before the next budget cycle.

AI-native software averages 52% gross margin. Traditional SaaS runs 70-85%. That gap is the new reality.
If you haven't separated customer-facing inference from R&D inference, your margin number is meaningless.
Boards are starting to ask for AI margin by revenue stream. Most Finance teams aren't ready.
Stand up AI cost centers and tag your spend now. Don't wait for budget season.

Traditional SaaS gross margin is 70-85%. AI-native software is currently averaging closer to 52%. That 18-33 point gap is the new economic reality of building software with AI, and it is reshaping how Finance teams need to think about cost classification, forecasting, and board reporting.

Ben Murray, founder of The SaaS CFO, joined me for a live conversation with Finance leaders on what this shift means and what to do about it. The full replay is below. What follows is the substance: the framework, the data, and the actions Finance teams should be working on right now, before budget season puts harder questions on the table.

The scope of the problem

AI infrastructure is no longer a fringe cost line. In Cloud Capital's Cost of Compute 2026 survey of 100 growth-stage SaaS Finance leaders, 97.5% reported AI as material to their cloud spend, averaging around 22% of total compute. For many AI-native businesses, inference and compute have now overtaken payroll as the largest line on the P&L. We have seen customers whose AI inference spend has compounded at more than 50% month over month.

That growth is hitting gross margin directly. Lower gross margin means weaker LTV, longer CAC payback, and less cash to reinvest in sales, marketing, and R&D. The standard SaaS growth playbook was built on 70-85% margins. The new normal will require a different one.

Four takeaways

A quick orientation before the takeaways. "AI" in a SaaS context covers two distinct categories with different P&L implications: workforce AI (employees using Claude, ChatGPT, Cursor, Copilot) sits in OpEx and is measured in productivity; product AI (large language models embedded in what customers buy) sits in COGS and is measured in revenue and gross margin. The discussion below is about product AI, where margin compression is showing up most acutely.

1. Classify before you optimize.

Cloud and AI infrastructure costs belong in COGS only when the workload exists to deliver the product to customers. That is the functional test under existing GAAP and IFRS, and the foundation of the Cloud Infrastructure Accounting Standards. Training is R&D OpEx. Customer-facing inference is COGS. Internal inference (development tools, testing, employee-facing applications) is OpEx. Infrastructure follows the function of the workload it supports.

Most Finance teams are still booking AI as undifferentiated cloud spend. The blended margin number tells you nothing about where to focus. If your legacy subscription business runs at 78% and your AI layer runs at 55%, the blend drifts as AI revenue mix grows, and you lose the ability to diagnose what is happening. Margins by revenue stream are no longer optional once AI is in the product.

2. Engineering decisions are margin decisions.

Four levers determine AI unit economics: usage intensity, model selection, token volume, and architecture design. Each one is set by engineering, often without Finance visibility. A single model swap can move gross margin 21 points without anything changing on the revenue side. Architectural choices like guardrails, retries, and fallbacks can multiply the true cost per user action.

This is why Finance-Engineering collaboration is no longer optional. The same coordination Finance has long had with go-to-market on customer economics is now required with engineering on AI economics. Without joint accountability, classification drifts and the P&L tells a different story than what is actually happening inside the product.

3. The token treadmill is breaking the cheap-AI narrative.

Per-token prices have been falling, and many Finance teams are budgeting on the assumption they will keep falling. They will, somewhat. But token consumption per task has risen 10-100x since 2023, driven by reasoning models, agentic workflows, and rising user expectations. Net cost per task is rising even as token prices fall.

Compounding the pressure, the major model providers are currently in market-capture mode, subsidized by equity capital and committing to large infrastructure investments. As those businesses mature toward profitability, the assumption that frontier-model prices will keep dropping becomes increasingly fragile. Finance teams that have built their AI forecast on a "tokens will get cheaper" assumption should pressure-test that assumption now.

4. Budget season is going to demand more from Finance.

The standard SaaS board deck no longer holds up. Boards and investors are starting to ask harder questions: What is your AI margin? How much of your AI cost is in COGS versus OpEx? What is the unit economics trajectory? And increasingly, beyond cost: what is the AI actually creating? Ben's four-layer measurement framework (consumption, work, outcomes, business impact) is the direction this conversation is heading. Most Finance teams are still stuck at consumption. The ones that move up the layers will tell a much stronger story.

The foundation work for all of this starts now. Separating production and development AI accounts, standing up cost centers for each AI cost category, and tagging support tickets are the steps that make every later analysis possible. Leaving them until the last few weeks before budget cycles begin will make planning significantly harder than it needs to be.

Watch the full conversation

The replay is below, along with the full transcript. Ben and I went deeper on each of these themes, including specific tactics for model routing, caching, prompt optimization, and pricing approaches. If your team is preparing for board planning or budget conversations, this is the discussion to share with your CTO before those meetings happen.

If you want help getting your AI and cloud costs mapped and classified correctly, Cloud Capital does that work for free. It is the Stage 1 foundational work most Finance teams know they need but have not had the time or tooling to complete. Book a call with the Cloud Capital team.

Complete Event Transcript

Full transcript from the webinar "Gross Margin in the Age of AI," hosted by Ed Barrow (Cloud Capital) and Ben Murray (The SaaS CFO). Edited for brevity and clarity.

Introduction [02:01]

Ed Barrow: Welcome everyone. We're focused today on one of the most important and talked-about topics at the moment: AI and the impact it's having on the SaaS P&L. I'm Ed Barrow, founder and CEO of Cloud Capital, and I'm joined by the SaaS CFO himself, Ben Murray.

Ben Murray: Thanks Ed. I was a CFO by trade for founder-owned and private equity-backed businesses, and for the last five years I've been doing fractional work. I have my academy where I teach SaaS metrics and finance, and I'm spending a lot of time now on AI finance. New unit economics, new frameworks, public tech companies adjusting to all of it. Fascinating times in software right now.

Ed: We have a lot of content to cover. It's going to be practical: clear guidance on how to think about this, what impact it's going to have, and how to start working through a plan to tackle the challenges.

Product AI vs. Workforce AI [03:35]

Ed: Before we get into the substance, I want to frame what we're focused on today, and what we're not. There are two distinct areas of AI being applied to software businesses, and sometimes some confusion about how to model them.

The first is workforce AI: employees using ChatGPT, Claude, and similar tools to improve how they work. That's about productivity and efficiency.

The second is product AI: embedding large language models into the products and services you sell to customers. That's customer-facing, and it's about revenue generation. How do you enhance your product to drive engagement, adoption, and revenue?

Both will show up on your cloud bill, sometimes from the same providers, which is part of what makes this confusing. We're focused today on product AI, because that's where the gross margin pressure is coming from.

The Scale of the Problem [05:30]

Ed: AI is one of the very fastest growing costs for software businesses. In our Cost of Compute 2026 survey of 100 growth-stage software CFOs at the start of this year, AI was already a material contributor to cloud spend for nearly everyone we surveyed. On average it was about 22% of compute spend, and growing significantly month on month. We've seen customers whose product AI costs have scaled by more than 50% per month, compounding.

Beyond contributing materially to core cloud spend, AI is having a very significant impact on gross margin. That's the main topic of today's conversation.

The AI Gross Margin Gap [07:23]

Ben Murray: With SaaS, we've traditionally aimed for 70 to 80% gross margins. Subscription margin can go even higher, into the 80 to 90% range. Now there's a big gap. Based on data from Bessemer and ICONIQ, AI-native companies delivering LLM-based products are tracking closer to 50 to 60%. That's a significant difference, and it's requiring changes to our framework and our chart of accounts so we can understand why margins are dropping.

There are still companies producing 70 to 80% with AI products. It depends on the models you're using, how heavy the usage is, whether it's agentic. There are a lot of variables, but this is the central debate: where will AI margins land?

Ed: Gross margin has a huge knock-on impact on valuations and on how we operate the business. The standard SaaS playbook around sales and marketing investments, R&D investments, and growth and scale, all relied on an underlying assumption of 70 to 85% gross profit. That allowed the kind of growth investments we've come to expect. If we're now seeing a new normal at meaningfully reduced gross margins, that has knock-on impacts across resourcing, growth plans, and retention requirements.

Ben: Investors are also expecting a clear AI product roadmap. A lot of public tech companies are seeing their growth re-accelerate because they've got AI traction. They want to show their AI product line growing 40% versus their legacy 15% to help boost valuation. We have to get serious about revenue tracking of AI versus the legacy product line.

One Decision, 21 Points of Margin [10:41]

Ben: If we're delivering AI products, we have to get really serious about margins. We didn't traditionally do this by customer unless we had a $5 million ARR contract with IBM and a dedicated team behind it. Traditionally everything was bundled into one gross margin. Now we have to get more detailed.

If you have an AI-infused product line and token costs aren't that bad, you may produce 95% gross margin. Not bad. But what happens when you have to use premium frontier models? Just walking through the scenario: with a premium model, margin drops 21 points to 74%, and gross profit dollars drop as well. There are a lot of scenarios depending on how you've built your AI product and which models you're using.

The Quality Trap [12:00]

Ben: The quality trap is fascinating, especially with outcome-based pricing. Say you're using AI to resolve support tickets. For tickets you successfully resolve autonomously, you charge the customer, and you produce nice margins on what you charge. But what if you have quality issues? What if the agent can't handle the ticket and you have to escalate to a human?

Now you can't charge the customer because the AI didn't work. Margin percentage on what you actually charged still looks good, but gross profit drops materially. In the example, gross profit goes from $47K to $32K. The dashboard looks healthy. The business isn't.

There's also pricing compression. As more competitors enter the market, you may still have a good margin percentage, but volume is down and gross profit dollars take the hit. A lot of variables here, and the point is we have to get more serious about how we measure AI margins.

Breaking Down AI Costs [13:43]

Ed: Even within product AI, the cost structure is more nuanced than it appears. There are three key areas of spend.

First, training: developing and fine-tuning your own models. Over the last year a lot of companies explored this, renting or buying GPUs to train their own generative AI. This is R&D work with extensive iteration.

Second, inference: the cost incurred each time a model processes a request. That might be models you trained yourself, but increasingly it's third-party models from Anthropic, OpenAI, or xAI that you embed in your product. Someone else paid for training, but you incur inference costs every time you interact with the model.

Third, infrastructure: everything else required to support AI usage. When you embed a model, you need to gather data from your customer, send it to the model, take the response back, process it, present it, and store the information. Traditional software was often relatively light on data processing. AI-enhanced use cases require processing significantly more data, both as inputs to and outputs from the models. That drives broader compute costs from your cloud provider on top of the model costs themselves.

Classifying AI Costs on the P&L [16:01]

Ed: These three cost types should be classified differently.

Training really shouldn't be impacting your gross margin if you allocate it correctly. It's R&D OpEx. This is about developing new IP. Being able to isolate your GPU consumption and model training costs into R&D OpEx is incredibly important. These tend to be large one-off costs that scale with model size and complexity.

Inference itself splits into two areas. If you're using a model to deliver value to customers, that goes into COGS. But your engineering team will also be using inference to test and develop new capabilities. That use should go into R&D OpEx as part of product development.

Infrastructure splits the same way. As your customer-facing AI usage grows, your associated infrastructure costs grow with it. As your R&D team builds new services, their infrastructure costs sit in OpEx. The simple heuristic: does this workload exist to deliver value to a customer? If yes, COGS. If no, OpEx. If shared, allocate proportionally.

To split inference between COGS and R&D, you can deploy in separate accounts, use different API keys, or use tagging. Smart use of tagging lets you split inference calls by use case: customer production versus testing, area of the product, even by specific customer.

Ben: This is going to change our chart of accounts. We'll have GL accounts and cost centers for AI in the COGS area. Maybe an AI cost center for inference costs, vector databases, AI infrastructure. We have to show investors, our board, and potential acquirers our AI traction and our AI margins. We also need to know who our heavy users are, because they could be eating into margin while light users subsidize them. We could have a usage problem that creates a pricing problem.

How AI Costs Behave [27:22]

Ed: Classification tells us where we are today. We also need to understand how these costs are going to behave as we scale.

A CFO recently described AI to me as a database: you query it, you get a response back, you use it. Except it's not a database of your own information, it's a database of the entirety of humanity's knowledge. You're making requests to the model, and the number of requests and tokens consumed within them determine cost.

Tokens are essentially the data volumes you're sending to and from the model. You pay for what you send and for what you get back. Simple optimizations apply immediately: send less information, ask the model to be more condensed.

Three things drive your AI gross margin impact. Cost per interaction equals tokens per interaction times cost per token. AI COGS per user equals cost per interaction times interactions per user. Gross margin impact equals AI COGS per user divided by revenue per user.

These compound. As your AI capabilities succeed and customers engage more, you drive more interactions, more token consumption, and more data flowing back and forth. The longer your customers stay, the deeper the data set you have to work with. These are snowball effects.

The Token Treadmill [31:19]

Ben: I call this the token treadmill. Some models are very cheap. I've built apps where I'm barely using any tokens. But I built another app to calculate software metrics with larger context windows feeding a lot of computed data, and I've already used a million tokens this month.

Per-token prices may keep dropping, but we're using more and more tokens. Agentic workflows use more tokens. The latest Anthropic release uses more tokens than its predecessor for the same request. Usage is outpacing the price drop.

Ed: Every customer we work with is seeing this, from those just getting their product out the door to those thinking carefully about optimization. Even with optimization, costs aren't dropping materially because token volume is escalating so rapidly.

In some respects that's a good thing. It's an indirect signal of the value these products are delivering. More customers engaging, deeper integration, more data flowing. One customer told me that since embedding AI in their product, customers are staying longer and feeding in more information than ever before. But that means every day they're gathering more data on each customer and passing more of it to the LLM. They have to be careful about data management to contain the explosion in tokens.

I'd also be cautious about over-relying on the assumption that token prices will drop materially over the next few years. There are real infrastructure improvements making models cheaper to operate, and some of that will pass through. But the major model providers are motivated to keep their core token prices stable or in their favor. Premium frontier models will maintain premium price points.

The current pricing environment is also being subsidized. Large model providers are in market-capture mode. Anthropic reportedly spent more on AWS than they made in revenue last year. They're not in margin expansion mode, they're in growth mode. As they mature toward profitability, they'll need to show expanding gross margins. Uber is the analogy: cheap when they were buying market share, much more expensive now that they're profit-maximizing. If your AI forecast assumes tokens will keep getting cheaper, pressure-test that assumption.

The Margin Roadmap [38:20]

Ben: For finance folks, we're not expected to be the technical experts. But I've learned a lot building my own AI products. Model routing is a big thing. I've experimented with workflows where the orchestrator uses a better model and the research assistant uses a cheaper model that does the heavy data crunching.

CFOs need to be more aligned with the dev team and the CTO. We need to understand model routing, prompt caching, prompt optimization. We have to educate ourselves on these technical terms so we can have quality discussions with engineering as we build out product lines.

Ed: The overriding element here is engaging with your engineering team. Choosing the right model, caching repeated requests, prompt optimization: these are engineering-oriented decisions that have meaningful impact on your P&L. In the past, it was fairly easy to leave engineering alone to build products as they wanted and pick up the tab at the end of the month. Now these decisions need active collaboration between finance and engineering.

There are also non-engineering levers. Packaging and pricing: a lot of organizations are bundling new AI capabilities without fully reflecting that in pricing, often from a competitive concern. Procurement optimization: there are enterprise options to commit to token volumes for discounts, sometimes 20 to 30% on large-scale consumption. Those require accurate forecasting, which is why classification and cost behavior visibility matter so much.

Ben: Instrumentation has to be baked in as you build these products. Good tracking of tokens, models, features, customers. As CFO, you don't want to be told later that the usage data you need doesn't exist.

The Four Layers of AI Measurement [43:14]

Ben: We're seeing the evolution of how AI ROI is proven. Before, consumption was enough: lots of token usage, active users. That's the minimum now. Public tech companies are going beyond it.

There are four layers. First, consumption: tokens used. Second, work: what the AI did. Salesforce coined the term "agentic work unit." How many tokens does it take to complete a work unit, and how does that efficiency improve over time? Third, outcomes: actual results. Did it close the support ticket, book the demo, close the opportunity? Fourth, business impact: P&L effect for the customer. That's where we can show ROI most credibly.

Each layer requires the foundation below it. You can't have outcomes without defined work units. You can't have work units without defined consumption. We have to work through this if we're delivering AI product lines.

The 30-Day Action Plan [45:13]

Ben: For accounting and finance, start with the chart of accounts. How are you tracking inference, infrastructure, model monitoring? That's the foundation. It may mean adding an AI cost center instead of bundling everything into a general DevOps cost center, tagging those expenses, working with the dev team.

Then get into the calculations. AI cost per user, per request, per feature. We need more sophisticated metrics so we can have better conversations with our board and leadership about where margins and usage are going. This is no longer theory. There are changes we have to make to our financial infrastructure, and we have to put this instrumentation in place now, on the accounting close side and working with the dev team for the data we need.

Ed: For anyone looking at this as a daunting list, this is the work Cloud Capital does. We help with the foundational elements: how to correctly classify costs, how to split between COGS and OpEx, how to allocate to specific products and components. We provide that capability for free, and we have dozens of growth-stage AI-native businesses using it to automatically allocate traditional compute alongside inference and AI costs, applying best-practice standards.

It's not just cost. We show cost per customer, margin per customer, and the granular margin impact of AI consumption across the business: by compute type, by customer, by product. If you're tackling Stage 1 of separating costs and tagging cost centers, or moving toward cost per user and cost per customer, we can support that. And it's built for finance.

Ben: This is so important because the upcoming budget season is going to look different than any before. We've had time to push out AI product lines. We have internal AI usage. The standard board deck isn't going to cut it. We need this detail now. Budget season is tough already. We have to start thinking about this today, or it will be more hellish than usual.

Ed: I wouldn't recommend leaving it until the last few weeks before budget cycles begin. Get this in place ahead of time.

Closing [49:59]

Ed: It's a huge topic. We could go on for hours. We'll share the Cost of Compute report and Ben's blog. If you're looking to get started with the foundations, or with modeling ahead of budget season, reach out to us at Cloud Capital.

Ben: It's top of mind for every CFO right now. We have a tech CFO meetup and this is what we're talking about. There are no textbooks on this, barely any good blog posts. The education has started. My last several blog posts have been AI focused. Feel free to check them out.

Ed: Thanks everyone for joining. We'll share materials after this. If you have questions, reach out to Ben or me directly.

Ben: Thanks for joining.

Last Updated