The leadership question is shifting from who is using AI to ROI and whether AI is improving delivery, margins, and outcomes.
11 min read

AI adoption has moved past access and experimentation.
For the past few years, much of the AI conversation has been shaped by adoption. Teams were encouraged to try new tools, experiment with copilots, explore assistants, and find ways to apply AI across the software delivery lifecycle.
That period was necessary. It was also partially subsidized.
Many organizations experienced AI adoption as relatively low-friction. The costs were often hidden inside licenses, promotional pricing, platform incentives, bundled tools, venture-backed pricing models, or early experimentation budgets. The priority was usage, learning, and momentum.
That era is changing.
We are moving from the subsidized era of AI into the economic era of AI. The conversations I see with industry leaders, executives, and recruiters are shifting. The question is no longer only whether teams are using AI, but how AI is being used, where the spend is going, what value is being created, and whether the return justifies the cost, risk, and operational complexity.
The invoices are arriving. Usage is growing. Tokens are being consumed across engineering workflows, internal tools, agentic systems, production support, and customer-facing product experiences.
That changes the leadership question.
AI token spend cannot be managed only as a software invoice. It needs to be understood as capacity, product economics, and part of the operating model for modern product engineering.
The question has shifted from, “How much are we spending on AI?” to, “What is that spend helping the business realize?”
The risk is not simply that AI costs more. The deeper risk is that organizations use AI to amplify the same output-driven management systems that already struggled to connect activity to outcomes.
The subsidized era rewarded adoption. The economic era will reward visibility, discipline, and measurable return.
From Subsidized Adoption to Economic Accountability
AI token spend cannot be managed only as a vendor invoice. It needs to be understood as part of the product engineering operating model.
In a product engineering organization, AI may support discovery, design, implementation, testing, security review, documentation, release, production support, incident response, runtime inference, and customer-facing workflows.
That makes AI another form of capacity inside the delivery system.
Senior technology leaders have always been expected to explain how effectively salaries, headcount, engineering capacity, platforms, delivery systems, and operating practices are being converted into customer value and business results.
AI expands that responsibility.
Leaders now need to understand where AI capacity is being used, what work it supports, what value comes back, and what waste, risk, or complexity it creates.
AI can make strong systems better. It can also make weak systems more expensive.
In strong systems, AI can accelerate learning, delivery, quality, and customer value. In weak systems, it can accelerate confusion, overproduction, rework, risk, and technical debt. That is why usage alone is not enough.
AI Spend Is Becoming Product Economics
AI spend differs from many traditional software costs in that it scales directly with usage intensity.
During early adoption, token spend may look like experimentation cost, innovation spend, or a software tool budget. As adoption matures, that view becomes too narrow.
Some AI spend supports the cost of building the product. Some AI spend is included in the cost of running the product. When AI is embedded in customer-facing workflows, runtime inference can become part of the variable cost to serve the customer.
That moves AI spend into the economics of the product itself. Build-time AI affects the cost of creating and delivering software. Runtime AI affects the cost of serving customers.
That distinction matters because a feature may look inexpensive during development but become expensive in production. Every customer interaction may trigger model calls, long prompts, large context windows, retrieval pipelines, image processing, retries, or agentic workflows.
A product feature may look successful because adoption is high. But if the cost of serving that feature grows faster than the value captured, the business may be subsidizing usage without realizing it.
That is an Engineering issue. It is also a Product, Finance, and executive team issue. The engineering leader’s role is to make those economics visible before they become a surprise.
AI Cost Should Follow the Value Stream
Many product engineering organizations do not operate through purely functional silos. They operate through cross-functional teams, product areas, platforms, services, and value streams.
AI cost follows the work, not the org chart.
A single product capability may involve AI usage across discovery, design, coding, debugging, testing, documentation, security review, deployment, production support, and runtime product behavior.
If leaders only look at AI spend by department, they may miss the real economics of delivery. The more useful view is AI usage by product, feature, initiative, environment, workflow, model, and value stream.
This is also where AI spend needs to become visible in product economics.
AI investment should be aligned to products and value streams. Some AI spend supports development and delivery and becomes part of Cost of Goods Sold (COGS). Some AI spend becomes part of runtime cost to serve, especially when AI is embedded in customer-facing workflows, production support, or operational processes.
Those costs need to be included in ROI calculations. Otherwise, leaders may overstate the return, understate the cost, and misunderstand the true margin of a product, feature, or value stream.
A product may look successful because adoption is high or delivery is faster. But if AI usage increases COGS, runtime support costs, production inference costs, or operational complexity faster than the value captured, the product economics may be weaker than they appear.
This does not mean every token should become a burdensome accounting exercise. That would create its own waste. The goal is enough visibility to make better decisions without turning every prompt into chargeback theater.
At a minimum, AI usage should be attributable by product, team, feature, or initiative, environment, model, and use case. In more mature systems, usage can be connected to epics, stories, pull requests, services, customer-facing workflows, production inference events, and runtime support activities.
This becomes even more important as organizations reduce team sizes, slow hiring, or expect AI-enabled capabilities to absorb work previously performed by humans.
Reducing human costs while making AI spend invisible does not improve the economics of product delivery. It moves the cost into a different part of the system.
Cost Reduction Requires Knowing What AI Supports
When organizations need to reduce costs, they often look at headcount. That is painful, but the management model is familiar. Leaders can estimate which roles or teams are affected, what capacity is lost, and what work may slow down or stop.
AI creates a different version of the same problem.
If an organization needs to reduce AI spend, what exactly should it cut?
Licenses? Token budgets? Access to certain models? Agentic workflows? Customer-facing AI features? Runtime inference? Internal engineering enablement?
A simple percentage reduction across every team may feel fair, but it can damage the wrong parts of the system. It is similar to cutting engineering capacity without understanding which products, value streams, customer workflows, or strategic initiatives depend on that capacity.
AI capacity needs the same management discipline leaders already apply to human capacity.
If spending needs to come down, leaders should understand which uses create value, which are experimental, which are wasteful, which are tied to revenue, which are improving delivery, and which are quietly creating costs without a measurable return.
Without that visibility, leaders may cut token budgets that are helping teams move valuable work faster while leaving expensive, low-value workflows untouched.
AI cost management should focus on matching capacity to real needs so spending stays intentional and sustainable.
AI-SDLC Is Becoming a Leadership Discipline
Engineering leaders need to bring AI inside the SDLC rather than allowing it to operate around the edges of the delivery system.
An AI-enabled SDLC does not need to become heavyweight governance. It means AI use is visible, intentional, risk-aware, and tied to how software moves from idea to production.
Using AI to summarize internal notes is different from using it to generate production code.
Generating a unit test is different from modifying infrastructure.
Creating a product requirement summary is different from sending customer data into a model.
Using an AI assistant during development is different from operating a customer-facing AI feature in production.
A practical AI-SDLC makes those distinctions clear. It gives teams guidance on approved tools, data boundaries, review expectations, quality controls, security considerations, cost implications, and production requirements without requiring permission for every small decision.
It also creates feedback loops.
If AI-assisted work results in defects, rework, unexpected costs, security concerns, maintenance burdens, or production issues, the organization should learn from them and improve the system.
Without that discipline, token management becomes after-the-fact reporting. Teams use AI. Invoices arrive. Finance asks questions. Engineering tries to reconstruct where the money went. That is archaeology, not management.
Model Routing and Agents Change the Cost Model
Different types of work need different levels of AI support.
Routine tasks may only need a smaller model. Time-sensitive work may prioritize speed. Complex problems may require deeper reasoning or stronger coding performance. Sensitive data may require approved models with the right controls. Some work may not be appropriate for AI at all.
Sending every request to the most capable model may feel simple, but it can be financially careless.
Model selection needs to consider cost, latency, quality, reliability, security, data sensitivity, and business value.
The same principle applies to agents.
A chat interaction has a relatively simple cost shape. The user asks, the model responds, and the human decides what to do next.
An agentic workflow is different.
It can plan, inspect a codebase, read files, call tools, modify code, run tests, fail, retry, spawn subtasks, summarize progress, re-read context, change approach, and continue looping until it reaches a stopping condition.
Every loop has a cost. Every retry consumes more tokens. Every spawned subagent can multiply the spend.
If the agent solves the wrong problem, introduces unnecessary complexity, or generates a solution that the team does not understand, the spend has already happened.
The team may now face two costs.
The first is the token cost of the agentic run. The second is the human cost of review, correction, rework, simplification, rollback, or operational support.
That is why goal-driven agents need boundaries: clear intent, defined scope, cost limits, stopping rules, review checkpoints, and evidence of completion.
Without those constraints, agentic AI can quietly create a new form of waste. Automated overproduction.
AI Can Create a Double Spend
AI can generate a lot of code quickly. Sometimes that code is useful. Sometimes it may even be high quality in isolation. But that does not mean it is the right code, the simplest code, the most maintainable code, or the code the team should accept into the system.
AI can create a double spend.
The organization pays once in token spend to generate the work. Then it pays again through review burden, rework, complexity, technical debt, maintenance cost, operational risk, and slower future delivery.
The pull request may look complete. The summary may sound confident. The tests may pass. The change may appear productive. But passing tests does not always mean the design is right.
If a reviewer cannot reasonably understand what changed, why it changed, whether it was necessary, and how it affects the rest of the system, accountability has weakened.
The organization may think it has created efficiency. In reality, it may have created approval theater. Every line of code that enters production becomes part of the system the team must understand, secure, test, operate, and evolve.
Human Accountability Still Matters
AI adoption can create a subtle accountability gap.
It is easy to say the model suggested it, the agent changed it, the assistant wrote it, or the tool generated it. That language may be convenient, but it is dangerous.
Production systems need accountable humans.
If AI writes code, the engineer owns the code. If AI generates a test, the team is responsible for its validity. If AI creates a configuration, the team owns the operational behavior. If AI summarizes customer feedback, the product team owns the interpretation. If AI recommends a decision, the leader owns the decision.
This is also where leaders need to watch for the AI Rockstar anti-pattern. This is the person who appears incredibly productive because they can use AI to generate large solutions quickly. On the surface, that can look impressive. Senior leaders need to look deeper.
Did the team understand what changed? Can the team explain the design? Can the team support it in production? Can the team secure, maintain, and evolve it six months from now? Can the business justify the cost and complexity created by the workflow?
The risk is not that talented engineers use AI well. We want engineers to use AI well. The risk emerges when AI-enabled output outruns shared understanding. That creates a different kind of technical debt. Not only code debt. Comprehension debt.
Software does not become production-ready because someone generated it quickly. It becomes production-ready when the team understands it, validates it, owns it, and can operate it safely.
Flow, Realization, and the Management System
Do the metrics need to change as AI becomes part of the delivery system?
Not entirely.
Many of the system metrics already exist. What needs to change is how leaders connect AI usage to those metrics, and whether they use them to understand Flow, quality, cost, and outcomes.
The number of prompts, AI-assisted commits, or generated lines of code may tell us that activity increased. They do not tell us whether the system improved.
Useful metrics should connect AI usage to flow, quality, cost, and outcomes. We already have many of these system metrics today through Flow Metrics, delivery performance measures, and outcome measures.
Are teams getting valuable features to market faster (Flow Time)? Is throughput improving (Flow Velocity)? Is work spending less time waiting and more time moving (Flow Efficiency)? Is quality holding? Is change failure rate going down, staying stable, or getting worse? Is work more complete and accurate when it reaches the next step (Change Failure Rate)? Are incidents easier to resolve (Mean Time to Recover)? Are we improving customer outcomes, or are we simply producing more work (Realization / Outcomes)?
This is where Flow and Realization matter. Flow shows whether work is moving more efficiently. Realization shows whether the movement created the intended business result.
AI may improve Flow while hurting Realization if teams ship faster but create more defects, more rework, a heavier support burden, higher cloud costs, greater token spend, margin pressure, or technical debt.
Managing AI adoption is about helping teams move faster without losing control of cost, quality, security, reliability, maintainability, and accountability.
That requires a management system.
A practical executive operating model for AI adoption should include approved use cases, data-handling guidance, model-selection standards, an AI-SDLC, cost visibility, model routing, budget guardrails, anomaly detection, agentic workflow boundaries, quality controls, security controls, and human accountability principles.
The organizations that win with AI will understand where AI improves Flow. They will measure whether AI improves Realization. They will manage AI spend as part of product economics. They will route work to the right models for the right reasons. They will use agents where the value justifies the autonomy and cost. They will make AI spend visible across the value stream. They will keep humans accountable for what reaches customers.
AI adoption is now a product engineering leadership discipline.
The leaders who manage it well will help their teams move faster, learn faster, and deliver better outcomes without losing sight of cost, quality, accountability, and the long-term health of the systems they are responsible for.
So the next time AI spend shows up in a dashboard or invoice, the question is no longer, “How much did we spend?” The better question is, “What did that spend help the business realize?”
Are you looking at AI spend as an invoice, a team-level budget, a product economics question, a capacity-management question, or part of the broader software delivery operating model?
I would enjoy comparing notes.