• Skip to primary navigation
  • Skip to main content
Rethink Your Understanding

Rethink Your Understanding

Transforming Software Delivery

  • Home
  • Mission
  • Collaboration
  • Posts
  • Podcast
  • Endorsements
  • Resources
  • Contact

Product Delivery

AI Is Improving Software Engineering. But It’s Only One Piece of the System

July 31, 2025 by philc

5 min read

A follow-up to my last post Leading Through the AI Hype in R&D, this piece explores how strong AI adoption still needs system thinking, responsibility, and better leadership focus.

Leaders are moving fast to adopt AI in engineering. The urgency is real, and the pressure is growing. But many are chasing the wrong kind of improvement, or rather, focusing too narrowly.

AI is transforming software engineering, but it addresses only one part of a much larger system. Speeding up code creation doesn’t solve deeper issues like unclear requirements, poor architecture, or slow feedback loops, and in some cases, it can amplify dysfunction when the system itself is flawed.

Engineers remain fully responsible for what they ship, regardless of how the code is written. The real opportunity is to increase team capacity and deliver value faster, not to reduce cost or inflate output metrics.

The bigger risk lies in how senior leaders respond to the hype. When buzzwords instead of measurable outcomes drive expectations, focus shifts to the wrong problems. AI is a powerful tool, but progress requires leadership that stays grounded, focuses on system-wide improvement, and prioritizes accountability over appearances.

A team member recently shared Writing Code Was Never the Bottleneck by Ordep. It cut through the noise. Speeding up code writing doesn’t solve the deeper issues in software delivery. That article echoed what I’ve written and experienced myself. AI helps, but not where many think it does, “currently”.

This post builds on my earlier post, Leading Through the AI Hype in R&D That post challenged hype-driven expectations. This one continues the conversation by focusing on responsibility, measurement, and real system outcomes.

Code Implementation Is Rarely the Bottleneck

Tools like Copilot, Claude Code, Cursor, Devon, … can help developers write code faster. But that’s not where most time is lost.

Delays come from vague requirements, missing context, architecture problems, slow reviews, and late feedback. Speeding up code generation in that environment doesn’t accelerate delivery. It accelerates dysfunction.

I Use AI in My Work

I’ve used agentic AI and tools to implement code, write services, and improve documentation. It’s productive. But it takes consistent reviews. I’ve paused, edited, and rewritten plenty of AI-generated output.

That’s why I support adoption. I created a tutorial to help engineers in my division learn to use AI effectively. It saves time. It adds value. But it’s not automatic. You still need structure, process, and alignment.

Engineers Must Own Impact, Not Just Output

Using AI doesn’t remove responsibility. Engineers are still accountable for what their code does once it runs.

They must monitor quality, performance, cost, and user impact. AI can generate a function. But if that function causes a spike in memory usage or breaks under scale, someone has to own that.

I covered this in Responsible Engineering: Beyond the Code – Owning the Impact. AI makes output faster. That makes responsibility more critical, not less. Code volume isn’t the goal. Ownership is.

Code Is One Step in a Larger System

Software delivery spans more than development. It includes discovery, planning, testing, release, and support. AI helps one step. But problems often live elsewhere.

If your system is broken before and after the code is written, AI won’t help. You need to fix flow, clarify ownership, and reduce friction across the whole value stream.

Small Teams Increase Risk Without System Support

Some leaders believe AI allows smaller teams to do more. That’s only true if the system around them improves too.

Smaller teams carry more scope. Cognitive load increases. Knowledge becomes harder to spread. Burnout rises.

Support pressure also grows. The same few experts get pulled into production issues. AI doesn’t take the call. It doesn’t debug or triage. That load falls on people already stretched thin.

When someone leaves, the risk is bigger. The team becomes fragile. Response times are slow. Delivery slips.

The Hard Part Is Not Writing the Code

One of my engineers said it well. Writing code is the easy part. The hard part is designing systems, maintaining quality, onboarding new people, and supporting the product in production.

AI helps with speed. It doesn’t build understanding.

AI Is a Tool. Not a Strategy

I support using AI. I’ve adopted it in my work and encourage others to do the same. But AI is a tool. It’s not a replacement for thinking.

Use it to reduce toil. Use it to improve iteration speed. But don’t treat it as a strategy. Don’t expect it to replace engineering judgment or improve systems on its own.

Some leaders see AI as a path to reduce headcount. That’s short-sighted. AI can increase team capacity. It can help deliver more features, faster. That can drive growth, expand market share, and increase revenue. The opportunity is to create more value, not simply lower cost.

The Metrics You Show Matter

Senior leaders face pressure to show results. Investors want proof that AI investments deliver value. That’s fair.

The mistake is reaching for the wrong metrics. Commit volume, pull requests, and code completions are easy to inflate with AI. They don’t reflect real outcomes.

This is where hype causes harm. Leaders start chasing numbers that match the story instead of measuring what matters. That weakens trust and obscures the impact.

If AI is helping, you’ll see a better flow. Fewer delays. Faster recovery. More predictable outcomes. If you’re not measuring those things, you’re missing the point.

AI Is No Longer Optional

AI adoption in software development is no longer a differentiator. It’s the new baseline.

Teams that resist it will fall behind. No investor would approve a team using hammers when nail guns are available. The expectation is clear. Adopt modern tools. Deliver better outcomes. Own the results.

What to Focus On

If you lead AI adoption, focus on the system, not the noise.

  • Improve how work moves across teams
  • Reduce delays between steps
  • Align teams on purpose and context
  • Use AI to support engineers, not replace them
  • Measure success with delivery metrics, not volume metrics
  • Expect engineers to own what they ship, with or without AI

You don’t need more code. You need better outcomes. AI can help, but only if the system is healthy and the people are accountable.

The hype will keep evolving. So will the tools. But your responsibility is clear. Focus on what’s real, what’s working, and what delivers value today.

Poking Holes

I invite your perspective on my posts. What are your thoughts?

Let’s talk: phil.clark@rethinkyourunderstanding.com


References

  1. Clark, Phil. Leading Through the AI Hype in R&D. Rethink Your Understanding. July 2025. Available at: https://rethinkyourunderstanding.com/2025/07/leading-through-the-ai-hype-in-rd
  2. Ordep. Writing Code Was Never the Bottleneck. Available at: https://ordep.dev/posts/writing-code-was-never-the-bottleneck
  3. Clark, Phil. Responsible Engineering: Beyond the Code – Owning the Impact. Rethink Your Understanding. March 2025. Available at: https://rethinkyourunderstanding.com/2025/03/responsible-engineering-beyond-the-code-owning-the-impact

Filed Under: Agile, AI, DevOps, Engineering, Leadership, Metrics, Product Delivery, Software Engineering

Leading Through the AI Hype in R&D

July 27, 2025 by philc

7 min read

Note: AI is evolving rapidly, transforming workflows faster than expected. Most of us can’t predict how quickly or to what level AI will change our teams or workflow. My focus for this post is on the current state, pace of change, and the reality vs hype at the enterprise level. I promote the adoption of AI and encourage every team member to embrace it.

I’ve spent the past few weeks deeply immersed in “vibe coding” and experimenting with agentic AI tools during my nights and weekends, learning how specialized agents can orchestrate like real product teams when given proper context and structure. But in my day job as a senior technology leader, the tone shifts. I’ve found myself in increasingly chaotic meetings with senior leaders, chief technology officers, chief product officers, and engineering VPs, all trying to out-expert each other on the transformative power of AI on product and development (R&D) teams.

The energy often feels like a pitch room, not a boardroom. Someone declares Agile obsolete. Another suggests we can replace six engineers with AI agents. A few toss around claims of “30× productivity.” I listen, sometimes fascinated, often frustrated, at how quickly the conversation jumps to conclusions without asking the right questions. More troubling, many of these executives are under real pressure from investors and ownership to show ROI. If $1M is spent on AI adoption, how do we justify the return? What metrics will we use to report back?

Hearing the Hype (and Feeling the Exhaustion)

One executive confidently declared, “Agile and Lean are dead,” citing the rise of autonomous AI agents that can plan, code, test, and deploy without human guidance. His opinion echoed a recent blog post, Agile Is Dead: Long Live Agentic Development, which criticized Agile rituals like daily stand-ups and sprints as outdated and encouraged teams to let agents take over the workflow¹. Meanwhile, agile coaches argue that bad Agile, not Agile itself, is the real problem, and that AI can strengthen Agile if applied thoughtfully.

The hype escalates when someone shares stories of high-output engineering from one of the senior developers, keeping up with AI capabilities: 70 AI-assisted commits in a single night, barely touching the keyboard. Another proposes shrinking an 8-person team to just two engineers, one writing prompts and one overseeing quality, as the AI agents do the rest. These stories are becoming increasingly common, especially as research suggests that AI can dramatically reduce the number of engineers needed for many projects². Elad Gil even claimed most engineering teams could shrink by 5×–10×.

But these same reports caution against drawing premature conclusions. They warn that while AI enables productivity gains, smaller teams risk creating knowledge silos, reduced quality, and overloading the remaining developers². Other sources echo this risk: Software Engineering Intelligence (SEI) tools have flagged increased fragility and reduced clarity in AI-generated code when review practices and documentation are lacking³.

What If We’re Already Measuring the Right Things?

While executives debate whether Agile is dead, I find myself thinking: we already have the tools to measure AI’s impact, we just need to use them.

In my organization’s division, we’ve spent years developing a software delivery metrics strategy centered on Value Stream Management, Flow Metrics, and team sentiment. These metrics already show how work flows through the system, from idea to implementation to value. They include:

  • Flow metrics like distribution, throughput, time, efficiency, and load
  • Quality indicators like change failure rate and security defect rate
  • Sentiment and engagement data from team surveys
  • Outcome-oriented metrics like anticipated outcomes and goal (OKR) alignment

Recently, I aligned our Flow Metrics with the DX Core 4 Framework⁴ matrix, organizing them into four key categories: speed, effectiveness, quality, and impact. We made these visual and accessible, using this clear chart to show how each metric relates to delivery health. These metrics don’t assume Agile is obsolete or that AI is the solution. They track how effectively our teams are delivering value.

So when senior leaders asked, “How will we measure AI’s impact?” I reminded them, we already are. If AI helps us move faster, we’ll see it in flow time. If it increases capacity, we’ll see it in throughput (flow velocity). If it maintains or improves quality, our defect rates and sentiment scores will reflect that. The same value stream lens that shows us where work gets stuck will also reveal whether AI helps us unstick it.

Building on Existing Metrics: The AI Measurement Framework

Instead of creating an entirely new system, I layered an existing AI Measurement Framework on top of our existing performance metrics⁵. This format includes three categories:

  1. Utilization:
    • % of AI-generated code
    • % of developers using AI tools
    • Frequency of AI-agent use per task
  2. Impact:
    • Changes in flow metrics (faster cycle time)
    • Developer satisfaction or frustration
    • Delivered value per team or engineer
  3. Cost:
    • Time saved vs. licensing and premium token cost
    • Net benefit of AI subscriptions or infrastructure

This approach answers the following questions: Are developers using AI tools? Does that usage make a measurable difference? And does the difference justify the investment?

In a recent leadership meeting, someone asked, “What percentage of our engineers are using AI to check in code?” That’s an adoption metric, not a performance one. Others have asked whether we can measure AI-generated commits per engineer to report to the board. While technically feasible with specific developer tools, this approach risks reinforcing vanity metrics that prioritize motion over value. Without impact and ROI metrics, adoption alone can lead to gaming behavior, and teams might flood the system with low-value tasks to appear “AI productive.” What matters is whether AI is helping us delivery better, faster, and smarter.

I also recommend avoiding vanity metrics, such as lines of code or commits. These often mislead leaders into equating motion with value. Many vendors boast “AI wrote 50% of our code,” but as developer-experience researcher Laura Tacho explains, this usually counts accepted suggestions, not whether the code was modified, deleted, or even deployed.⁵ We must stay focused on outcomes, not outputs.

The Risk of Turning AI into a Headcount Strategy

One of the more concerning trends I’m seeing is the concept of “headcount conversion,” which involves reducing team size and utilizing the savings to fund enterprise AI licenses. If seven people can be replaced by two and an AI license, along with a premium token budget, some executives argue, then AI “pays for itself.” However, this assumes that AI can truly replace human capability and that the work will maintain its quality, context, and business value.

That might be true for narrow, repeatable tasks, or small organizations or startups struggling with costs and revenue. But it’s dangerous to generalize. AI doesn’t hold tribal knowledge, coach junior teammates, or understand long-term trade-offs. It’s not responsible for cultural dynamics, systemic thinking, or ethical decisions.

Instead of shrinking teams, we should consider expanding capacity. AI can help us do more with the same people. Developer productivity research indicates that engineers typically reinvest AI-enabled time savings into refactoring, enhancing test coverage, and implementing cross-team improvements², which compounds over time into stronger, more resilient software.

Slowing Down to Go Fast

Leaving those leadership meetings, I felt a mix of energy and exhaustion. Many people wanted to appear intelligent, but few were asking thoughtful questions. We were racing toward solutions without clarifying what problem we were solving or how we’d measure success.

So here’s my suggestion: Let’s slow down. Let’s agree on how we’ll track the impact of AI investments. Let’s integrate those measurements into systems we already trust. And let’s stop treating AI as a replacement for frameworks that still work; instead, let’s use it as a powerful tool that helps us deliver better, faster, and with more intention.

AI isn’t a framework. It’s an accelerator. And like any accelerator, it’s only valuable if we’re steering in the right direction.

Poking Holes

I invite your perspective on my posts. What are your thoughts?

Let’s talk: phil.clark@rethinkyourunderstanding.com


References

  1. Leschorn, J. (2025, May 29). Agile Is Dead: Long Live Agentic Development. Superwise. https://superwise.ai/blog/agile-is-dead-long-live-agentic-development/
  2. Ameenza, A. (2025, April 15). The New Minimum Viable Team: How AI Is Shrinking Software Development Teams. https://anshadameenza.com/blog/technology/ai-small-teams-software-development-revolution/
  3. Circei, A. (2025, March 13). Measuring AI in Engineering: What Leaders Need to Know About Productivity, Risk and ROI. Waydev. https://waydev.co/ai-in-engineering-productivity-risk-roi/
  4. Saunders, M. (2025, January 6). DX Unveils New Framework for Measuring Developer Productivity. InfoQ. https://www.infoq.com/news/2025/01/dx-core-4-framework/
  5. GetDX. (2025). Measuring AI Code Assistants and Agents. DX Research. https://getdx.com/research/measuring-ai-code-assistants-and-agents/

Filed Under: Agile, AI, Delivering Value, DevOps, Engineering, Leadership, Lean, Metrics, Product Delivery, Software Engineering, Value Stream Management

Bets, Budgets, and Reframing Software Delivery as Continuous Discovery

June 7, 2025 by philc

8 min read

This post is a follow-up to my articles on estimation and product operating models, exploring how adaptive investment, value discovery, and team ownership align with Vasco Duarte’s call for agility beyond the team.

In my earlier posts, “Software Delivery Teams, Deadlines, and the Challenge of Providing Reliable Estimates” and “How Value Stream Management and Product Operating Models Complement Each Other”, I explored two core challenges that continue to hold organizations back: the illusion of predictability in software estimation, and the inefficiency of funding work through rigid project-based models. I argued that software delivery requires a shift toward probabilistic thinking, value stream alignment, and investment in products and initiatives, not fixed-scope, time-bound projects.

Software implementation and delivery estimations have been a constant theme throughout my career. Often seen as a mix of art and science, they remain highly misunderstood. While teams tend to dread them, organizations rely on them for effective planning. Despite their contentious nature, software estimations are an essential part of the process, sparking countless articles, discussions, and debates in the industry. I’m not arguing against estimation or planning. Organizations must plan. Leaders need to make investment decisions, prioritize resource allocation, and create financial forecasts. That doesn’t change. What does need to change is how we think about estimates, how we communicate their confidence, and how we act on the signals that follow.

This is a nuance that can be hard to understand unless you’ve lived both sides, delivering software inside an Agile team and leading business decisions that depend on forecasts. Estimates aren’t the enemy. One lesson I’ve learned, and others often mention, is that the real issue lies in how rigidly we stick to assumptions and how slow we are to adjust them when real-world complexities arise. What we need is to improve both how the business relies on estimates and how delivery teams develop the capability to estimate, update, and communicate confidence levels over time.

A team member recently shared notes from an Agile meetup featuring Vasco Duarte’s talk, “5 Things Destroying Your Ability to Deliver, Even When You’re Agile.” While I didn’t attend the talk, I’ve followed Vasco’s work for years. The talk referenced a 2024 podcast episode of his on Investing in Software1, which I hadn’t listened to yet, until now. That episode inspired this follow-up article.

In this episode, Vasco highlights an important point: traditional project management, often seen in boardrooms and annual plans, is based on a flawed assumption that we can predict outcomes weeks in advance and expect nothing to change. Software development, much like the weather, is unpredictable and chaotic.

Even today, many people treat software estimates as if they were comparable to predicting timelines for manufacturing physical products or managing predictable projects, such as constructing a house or bridge. They expect precision, often clinging to the initial estimate as an unyielding benchmark and holding teams accountable to it. However, software development is an entirely different realm. It’s invisible, knowledge-driven work filled with unknowns and unpredictability. In complex systems, even a small input change can trigger dramatically different outcomes. We’ve all encountered the “simple request” that unexpectedly spiraled into a significant architectural overhaul. I appreciate how Vasco ties this to Edward Lorenz’s 1961 discovery that small changes in initial conditions can lead to drastically different outcomes in weather models. That idea became the foundation of chaos theory.

Sound familiar?

In software development, we refer to this as “new work with unknowns,” “technical debt,” “rewrite,” or “refactor.” But we rarely treat it with the same respect we give to unknowns in other disciplines. Instead, we often pretend we know what we’re doing, and then demand that others commit to it. That’s the real chaos.

In addition to my focus on probability-based estimations and the Product Operating Model, Vasco’s four-point manifesto supports a shift I’ve long advocated for in team estimates and product leadership. It encourages an approach to software delivery that prioritizes adaptability, relies on real-time feedback, and views investment as an ongoing process rather than a one-time decision. This mindset isn’t about removing unpredictability but about working effectively within it.

1. From Estimates to Bets: Embracing Uncertainty with Confidence

Vasco encourages us to think like investors, not project managers. Investors expect returns, but they also accept risk and uncertainty. They recognize that not every bet pays off, and they adjust their approach accordingly based on the feedback they receive. This mindset aligns closely with how I’ve approached probabilistic estimation.

In knowledge work, “unknown unknowns” aren’t the exception. They’re the norm. You don’t just do the work, you learn what the work is along the way. What appears simple on the surface may uncover deep design flaws, coupling, or misalignment. That’s why I advocate for making estimates that improve over time, where confidence and learning signals are more important than arbitrary story point velocity.

Instead of forcing Certainty, we can ask:

“How confident are we right now?”

“What would increase or decrease that confidence?”

“Are we ready to double down, or should we pause and reassess?”

That’s what makes it a bet. And bets are revisited, not rubber-stamped.

2. Budgeting for Change, Not Certainty

The second point in Vasco’s manifesto hits close to home: fund software like you invest in the stock market, bit by bit, adjusting as you go. This reinforces what I wrote in my product operating model article: modern organizations must stop budgeting everything up front for the year and assuming the original plan will hold.

Annual planning works for infrastructure, but not innovation and knowledge work.

In a product-based funding model, teams are funded by their value stream or product, not their project deliverables estimated or guessed over a year. They receive investment to continuously discover, deliver, and evolve, reassessing value rather than completing a fixed set of scope under a dated estimate. This model gives you flexibility: invest more in what’s working, cut back where it’s not, and shift direction without resetting your entire operating plan.

3. Experiments Are the New Status Report

Vasco’s third point is deceptively simple: experiment by default. But what he’s talking about is creating adaptive intelligence at the portfolio level, not just team agility.

When we fund work incrementally and view features or epics as bets, we need signals to tell us whether to continue. In our organization, that signal often comes in the form of experiments, lightweight tests, spikes, MVPs, or “feature toggles” that generate fast feedback.

These aren’t just engineering tactics. They’re governance mechanisms.

When teams experiment, they reduce waste, increase alignment, and surface learning early. But more importantly, they feed information back into the portfolio process. A product manager might learn that a new feature doesn’t solve the core problem. A tech lead might identify a performance bottleneck before it becomes a support nightmare. A value stream might kill a half-funded initiative before it eats more cost.

Experiments give you clarity. Gantt charts provide you with theater.

4. End-to-End Ownership Enables Real Agility

The fourth point in Vasco’s manifesto is about end-to-end ownership, and it resonates deeply with how our teams are structured. When teams own their products from idea to delivery to operation, they don’t just ship; they deliver. They learn, adapt, and inform the next bet.

This kind of ownership isn’t a luxury, it’s a prerequisite to agility at scale.

In our transition to a product operating model, we restructured our delivery teams to align with value streams. We gave them clarity of purpose, full-stack capability, and autonomy to act. But what we hope to get in return isn’t just faster output; it’s better signals.

Teams close to the work produce insights you can trust. Teams trapped in delivery factories or matrixed dependencies can’t.

The Three Ways Still Apply

Listening to Vasco’s manifesto again, I was struck by how strongly it aligns with a set of principles we’ve had since at least 2021: The Three Ways, as described by Gene Kim and coauthors in The DevOps Handbook.

  • The First Way emphasizes flow and systems thinking, focusing on how value moves across the entire stream, not just within teams or silos.
  • The Second Way amplifies feedback loops, not just testing or monitoring, but real signals about whether we’re solving the right problems.
  • The Third Way advocates for a culture of continuous experimentation and learning. Accepting uncertainty, embracing risk, and using practice to gain mastery.

These are all still relevant today. But what often goes unspoken is that these principles must extend beyond the delivery teams. They must engage in planning, budgeting, prioritization, and governance.

Vasco’s idea of funding software like investments and treating initiatives as “bets” highlights the need to strengthen feedback loops across the portfolio. Experimentation has shifted from simple automated testing to focusing on strategic funding and continuous learning. Similarly, flow isn’t just about deployment pipelines anymore; it’s about speeding up the process from business decisions to tangible, measurable results.

If we’re going to embrace agility across the business truly, we must apply the Three Ways at every level of the system, especially where strategy meets funding and planning.

The Real Work, Planning for Chaos, Leading with Signals

Here’s where I’ll close, echoing Vasco’s message: the fundamental constraint in software isn’t at the team level. It’s at the leadership level, where we cling to project thinking, demand estimates without context, and build plans on the illusion of Certainty.

I strongly advocate for incorporating confidence levels and probability estimations in our organization. However, we operate on an annual funding model, planning the entire year’s operating plan, including product development investments, in advance. I hope to eventually work with product-funded budgets instead. Only time will tell. However, we can still evaluate our product development investments as we go and adjust our direction if needed.

To effectively lead a modern software organization, treat funding like an investor, not a contractor. Measure progress based on learning, not hitting milestones. Enable teams to provide actionable insights, not just reports. Structure governance models around value-driven feedback, not activity tracking.

Because you’re no longer managing projects, you’re managing bets in a chaotic system. And the sooner we stop pretending otherwise, the better our outcomes will be.

Poking Holes

I invite your perspective on my posts. What are your thoughts?.

Let’s talk: phil.clark@rethinkyourunderstanding.com


References

  1. Duatre, Vasco (Host). “Xmas Special: Investing in Software: Alternatives To Project Management For Software Businesses.” December 27, 2024. Scrum Master Toolbox Podcast: Agile storytelling from the trenches [Audio podcast].  Apple Podcasts, https://podcasts.apple.com/us/podcast/scrum-master-toolbox-podcast-agile-storytelling-from/id963592988

Related Articles

  1. Software Delivery Teams, Deadlines, and the Challenge of Providing Reliable Estimates”. Phil Clark. rethinkyourunderstanding.com
  2. “How Value Stream Management and Product Operating Models Complement Each Other”. Phil Clark. rethinkyourunderstanding.com

Filed Under: Agile, DevOps, Leadership, Product Delivery, Software Engineering, Value Stream Management

When Team Structure Collides with Role Alignment

May 26, 2025 by philc

How Merging Engineering Models Can Disrupt What Works And What to Do About It

11 min read

After a recent merger, I was asked to advise an engineering organization that needed to align two very different delivery models.

One part of the organization used small, long-term, cross-functional teams with distributed leadership (self-managed). The other followed a traditional Engineering Manager (EM) model, where one manager handled people, delivery, and agile practices. The company wanted to unify job responsibilities, eliminate performance ambiguity, and ensure fair development opportunities across all teams. The executive leader of the larger organization articulated a clear vision: one company with a single, thoughtfully designed career path built on a foundation of care and respect.

These are worthy goals. I’ve helped lead engineering through nine acquisitions and know firsthand the importance of consistent titles and expectations. But I’ve also learned something else:
“Aligning job titles and responsibilities without fixing team design, architecture, role responsibilities, and delivery structure doesn’t solve the real issues. It just hides them and creates tension and career friction across the division.”

It’s not about being right. It’s about being aligned.

Alignment takes time, planning, and honest conversation.

I’m aligned with the executive leader’s vision: to unify as one company with a shared career path, achieved with care, not urgency. Whether that takes six months or a year and a half, the focus should be on clarity and collaboration, rather than speed.

The real challenge isn’t just structural, it’s cultural. Within the larger organization’s strong-willed leadership team, they have not worked within a self-managed team structure. Fixed perspectives can stall progress if we don’t create space to explore why the models differ, not just how they do. We need to identify the root causes of the structural divergence and assess the potential risks to team culture, autonomy, and product alignment, particularly for high-performing, self-managed teams.

Another point of the executive leader is that integration shouldn’t be imposed; it should evolve at the pace of shared understanding. Once we reach that point, we owe it to the teams to communicate with clarity before information leaks and assumptions or uncertainty take hold. The real challenge arose from the other senior leaders within the group. I won’t say which model is better, as it depends on the context. Instead this article explores the challenges that can occur when we centralize accountability and responsibilities without considering the unique context. It also looks at how well-meaning integration efforts can unintentionally disrupt high-performing teams.

Why This Matters: Fairness vs. Fit

After a merger or acquisition, it’s natural and smart for engineering leaders to unify role definitions, career paths, and performance frameworks. Inconsistent job titles and responsibilities across similar roles can create confusion, slow promotions, and introduce bias. If two managers hold the same title but lead very different types of teams, performance expectations become subjective. That’s not fair to them or to the engineers they support.

So, I understood the goals of the integration effort:

  • Establish unified job responsibilities across teams
  • Minimize churn, ensuring no team member feels alienated or unsupported during the transition
  • And maintain high-performing teams that can support product delivery and operational efficiency

The goals weren’t the problem. The real challenge was the implementation.

How can you use a shared career framework when team structures and responsibilities differ?

The difference in team design and responsibilities is where the challenges of friction and finding solutions began to emerge.

Two Team Models in Contrast

The Engineering Manager Model

In the parent or acquiring organization’s Engineering Manager (EM)- led structure, a single person is responsible for managing people, overseeing delivery, driving agile practices, and partnering with products. EMs are accountable for both team output and individual performance and development. In many cases, they also serve as the technical lead.

Each Engineering Manager (EM) typically works directly with a team of 6-10 softwar engineers. The team does not have a Scrum Master or Agile Coach; the EM is responsible for Agile accountability. Similarly, there is no dedicated QA team member, so quality accountability falls on the EM and the software engineers.

This EM model was framed as a version of the “Iron Triad” or “Iron Triangle,” centered on Engineering, Product, and (presumably) UX or Delivery. However, in practice, the Engineering Manager often became the default source of team process, performance, and planning.

This structure isn’t inherently wrong. It works best when:

  • Teams are large and need strong coordination
  • The architecture is monolithic or tightly coupled
  • Product and engineering require direct managerial alignment

However, when scaled broadly or applied without nuance, it can quickly lead to role overload and reliance on individuals rather than systems to drive outcomes.

The Self-Managed Cross-Functional Model

The smaller teams in the acquired organization followed a different model entirely. These were long-lived, cross-functional teams of 8 to 12 people, including 2-4 software engineers, 1 QA, 1 product manager/owner, and in many cases, agile delivery leads or scrum masters. They had everything they needed to deliver software without needing to coordinate with other teams in most cases.

In this structure:

  • Responsibilities are distributed across roles instead of consolidated under a single leader.
  • Engineering Managers exist—but act primarily as career coaches and mentors, not team leads.
  • Agile delivery is facilitated by dedicated Scrum Masters or Agile Leaders embedded in the team.
  • Managers typically oversee 5 to 7 engineers across multiple teams and contribute technically as ICs when appropriate.

These teams naturally align with micro services, subdomains, or product value streams. They work well when the architecture allows for autonomy, and the organization invests in clarity, trust, and lightweight governance.

The acquired organization structured its teams to align with clear architectural boundaries, with each team focused on a specific subdomain or service. This approach made the teams both cross-functional and architecturally cohesive, reflecting Conway’s Law by ensuring the team structure matched the design of the software.

Key Difference: Accountability Consolidation

Both models contain the same essential responsibilities: engineering, product collaboration, quality, and delivery. However, in one, accountability is centralized under a manager, while in the other, it is distributed across the team.

The solution isn’t just about structure. It’s about how tightly the team model mirrors the system it’s building.

Conway’s Law tells us that our software systems mirror our organizational communication structures. When architecture is monolithic or tightly integrated, it makes sense to have centralized accountability. But when architecture is modular and service-oriented, teams that map directly to system boundaries, are small, autonomous, and aligned to subdomains can accelerate delivery and reduce coordination overhead.

And structure doesn’t just affect outcomes, it shapes culture.

In centralized models, decision-making authority and responsibility often rest with the Engineering Manager. This can bring clarity, especially for early-career engineers or less mature teams. But it can also reduce autonomy or create learned dependence, where teams hesitate to act without explicit approval.

In distributed models, autonomy is expected, and with it, psychological safety becomes critical. Teams must feel trusted to make decisions, fail safely, and adjust course without manager intervention. When done well, this fosters ownership and speed. However, without strong role clarity, trust, and support systems can lead to confusion or misalignment.

So, while the surface question is, “What does the Engineering Manager own?” the deeper question is, “Does the team structure support the system architecture and the culture you want to build?”

Where It Breaks: Role Titles vs. Role Expectations

On paper, this integration effort was about consistency: standardizing job titles, aligning role definitions, and applying a shared career framework across teams.

In practice, that consistency masked a deeper misalignment: the same title, Engineering Manager, carried very different expectations depending on the model it came from.

In the Engineering Manager-led model:

  • The EM is accountable for people leadership, delivery, agile practice, team velocity, and technical direction.
  • There is no embedded Scrum Master or Agile Coach.
  • The EM is expected to own outcomes, from sprint or iteration health to individual growth to team throughput.

In the self-managed, cross-functional model:

  • The EM is a career manager and mentor, often contributing technically as a senior IC.
  • Agile facilitation is handled by a dedicated team member (e.g., Scrum Master, Agile Leader, Agile Delivery Manager).
  • Delivery ownership and accountability are shared across the team; no single role “owns” performance.

From the outside, both are “Engineering Managers.” But their responsibilities are fundamentally different. When performance reviews, promotion criteria, and development paths are built around the broader EM model, it disadvantages leaders from the self-managed structure or forces the organization to reshape successful teams just to fit the title.

The concern is that unifying role definitions without accounting for structural context can cause real harm.

That harm doesn’t just affect managers. It ripples through teams.

In EM-led models, where one person is accountable for delivery, agile practice, and performance metrics, teams often defer decisions upward, even when they have the skills and context to act. This dynamic can unintentionally train teams to wait for approval, eroding autonomy and making collaboration feel more performative than empowered.

By contrast, long-lived, self-managed teams tend to develop strong psychological safety over time. With clear boundaries and shared ownership, they solve problems together. However, when leadership begins redefining responsibilities around titles instead of how the team works, even these teams can start to hesitate.

Autonomy suffers not because self-managed models lack structure but because outside systems try to reimpose control where clarity already exists.

The friction isn’t theoretical. It appears in performance evaluations, hiring misalignment, and career planning confusion. Eventually, it reaches the team level where roles blur, ownership is second-guessed, and the structure that supported speed and trust begins to unravel.

Legacy Thinking and Structural Blind Spots

One of the biggest challenges in transformations like this isn’t technical. It’s cultural.

I’ve seen firsthand how legacy thinking, even well-meaning thinking, can shape decisions in ways that unintentionally resist growth. During this engagement, I saw it again.

In our initial conversation regarding team structures, an executive leader for the larger organization made the strategic decision:

“We’re not going to shift 40 teams to the self-managed model. It’s too resource-intensive. The smaller teams will need to align with our Engineering Manager model.”

In a follow-up conversation that I wasn’t part of, a VP from the larger organization said:

“I’ve been using the Engineering Manager model for most of my career. It works.”

These statements weren’t malicious. They were confident, experienced, and full of certainty.

Relying too much on past success can sometimes prevent us from seeing what fits the current situation. What worked earlier in your career or in a different system might not work now. True transformation requires more than confidence. It requires curiosity.

In yet another conversation, I heard secondhand that one of these same leaders, after our first meeting on the topic, asked:

“Has Phil ever been a software engineer?”

That question stuck with me because I wondered how my interest in how software is delivered equates to my technical expertise. While the leader challenged my background (all he had to do was look at my LinkedIn profile or ask for my resume), his comment revealed a mindset: If someone doesn’t share our experience, maybe their perspective doesn’t count.

These moments aren’t about ego. They’re about reflection, about recognizing how deeply personal experience can cloud structural objectivity. When leaders dismiss unfamiliar models because they don’t match their playbook, they don’t just reject ideas. They limit what the organization is allowed to become.

“Great leaders aren’t defined by how long they’ve done something. They’re defined by how often they’re willing to rethink it.”

What Self-Managed Teams Need to Work

To be clear, I’m not arguing that self-managed, cross-functional teams are inherently better. They only work when they’re supported intentionally.

In this case, the acquired teams didn’t stumble into autonomy. They evolved, shaped by architectural changes, growing product complexity, and deliberate investment in role clarity and delivery practices.

Self-managed teams work best when:

  • Team boundaries are aligned with system boundaries (Conway’s Law in action)
  • Each team has all the roles it needs to deliver independently: product, UX, engineering, QA, agile leadership
  • Leadership trusts the team to make decisions and solve problems
  • There are clear expectations for ownership, accountability, and feedback loops
  • The organization invests in agile coaching and systems thinking, not just delivery metrics

Autonomy is powerful, but it’s not a substitute for structure. It’s a different structure, distributed rather than centralized, but no less rigorous.

When organizations assume self-managed teams can succeed without support, they fail. But when they try to control teams that already have what they need to succeed, they risk breaking what’s working.

If you dismantle a working model to standardize roles without investing in the conditions that made those teams successful, you’re not gaining alignment; you’re sacrificing outcomes.

I see the challenge of finding the right hybrid solution, either in role responsibilities or team structure, during this transition. Only time will tell how these efforts turn out.

A Path Forward

While we started the conversation about picking one model over the other, the next set of conversations should be about understanding what each one needs to succeed and recognizing what might be lost by trying to force one to fit the other’s framework.

In this transition, I’m not advocating for a reversal of the decision. The leadership team has chosen the Engineering Manager model as the long-term structure. My role is to support that transition in a way that minimizes disruption, preserves what’s working, and honors the intent behind the change.

But that doesn’t mean copying a model wholesale. It means asking harder questions:

  • Can we implement the EM model without breaking value stream alignment or team autonomy?
  • Can we support delivery accountability without assigning an EM to every team if doing so fragments the architecture or inflates management layers?
  • Can we evolve role definitions to respect the existing strengths of self-managed teams instead of stripping them out?

I’ve noticed that the most effective organizations aren’t strict about sticking to rigid structures. Instead, they focus on designs that are fit for purpose.

Consider blending elements of both models:

  • Some teams may have embedded EMs; others may operate with distributed leadership and shared delivery ownership.
  • Agile responsibilities can be flexibly assigned based on team maturity, not hierarchy.
  • Career frameworks can accommodate different types of Engineering Managers as long as expectations are clear and fair and performance is measured in context.

You don’t need to choose between alignment and autonomy.

You need to design for both, based on the work, the system, and the people you have.

It isn’t easy; sometimes, a hybrid model might not scale perfectly. However, it’s often a better option than forcing consistency, which can harm results.

Final Reflection: Fit Over Familiarity

At the heart of this transition is a challenge I’ve seen a few times:

How do you unify an organization without undoing what’s already working?

The desire to standardize roles, expectations, and performance frameworks comes from a good place. But when titles are aligned without understanding the structural and cultural context that surrounds them, friction follows, quiet at first, then louder over time.

I’ve spent years helping engineering organizations navigate these types of changes, sometimes from the inside, sometimes as an advisor. And here’s what I’ve learned:

  • Job titles are not the problem, misaligned expectations are.
  • Structure should reflect system architecture, not management tradition.
  • Psychological safety and autonomy aren’t side effects of good teams, they’re preconditions for them.
  • Legacy success can cloud future-fit decisions, especially when we assume what worked before must work again.
  • Great teams thrive in models that are clear, intentional, and well-supported, whether they are EM-led or self-managed.

There is no perfect model. But there is such a thing as the right model for the moment, the product, and the architecture.

This integration effort isn’t just a structural change, it’s a chance to define what kind of engineering organization this will become.

If we stay curious, focus on outcomes, and respect the conditions that made teams effective to begin with, we can build a unified system that enables scale without sacrificing flow, clarity, or trust.

The outcome of this effort will depend on time and attitudes.

Key Takeaways

  • The EM and self-managed models are not interchangeable. Each comes with different responsibilities, accountability structures, and cultural implications.
  • Standardizing job titles without context can create unintended harm. Especially when one title represents two very different sets of expectations.
  • Misalignment erodes autonomy and psychological safety. Teams work best when they know where decisions live, and are trusted to make them.
  • Conway’s Law still applies. If team structure doesn’t mirror system architecture, coordination costs increase and ownership suffers.
  • A hybrid approach may be necessary. Especially in the short term, where context, maturity, and system constraints vary across teams.
  • You can support a transition while still protecting what works. Integration doesn’t have to mean erasure.

In the end, our goal is to establish clear and unified job responsibilities across teams, minimize churn, and ensure that no team member feels alienated or unsupported during the transition. We aim to build high-performing teams that can deliver on existing commitments while maintaining operational efficiency.


Poking Holes

I invite your perspective on my posts. What are your thoughts?.

Let’s talk: phil.clark@rethinkyourunderstanding.com

Filed Under: Agile, DevOps, Engineering, Leadership, Lean, Product Delivery, Software Engineering, Value Stream Management

We Have Metrics. Now What?

May 11, 2025 by philc

7 min read

A Guide for Legacy-Minded Leaders on Using Metrics to Drive the Right Behavior

From Outputs to Outcomes

A senior executive recently asked a VP of Engineering and the Head of Architecture for industry benchmarks on Flow Metrics. At first, this seemed like a positive step, shifting the focus from individual output to team-level outcomes. However, the purpose of the request raised concerns. These benchmarks were intended to evaluate engineering managers’ performance for annual reviews and possibly bonuses.

That’s a problem. Using system-level metrics to judge individual performance is a common mistake. It might look good on paper but often hides deeper issues. This approach is for senior leaders adopting team-level metrics who want to use them effectively. You’ve chosen better metrics. Now, let’s make sure they work as intended. It risks turning system-level signals into personal scorecards, creating the dysfunction these metrics are meant to reveal and improve. Using metrics this way negates their value and invites gaming over genuine improvement.

To clarify, the executive’s team structure follows the Engineering Manager (EM) model, where EMs are responsible for the performance of the delivery team. In contrast, I support an alternative approach with autonomous teams built around team topologies. These teams include all the roles needed to deliver value, without a manager embedded in the team. These are two common but very different models of team structure and performance evaluation.

This isn’t the first time I’ve seen senior leaders misuse qualitative metrics, and it likely won’t be the last. So I asked myself: Now that more leaders have agreed to adopt the right metrics, do they know how to use them responsibly?

I will admit that I was frustrated to learn of this request, but the event inspired me to create a guide for leaders, especially those used to traditional, output-focused models who are new to Flow Metrics and team-level measurement. This article shares my approach to metrics, focusing on curiosity, care, and a learning mindset. It’s not a set of rules. You’ve already chosen team-aligned metrics, and now I’ll explain how we use them to drive improvement while avoiding the pitfalls of judgment or manipulation.

A Note on Industry Benchmarks

At the beginning of this post, the senior leader requested industry benchmarks, specifically for Flow Metrics. When benchmarks are treated as targets or internal scorecards, they can reduce transparency. Teams might focus on meeting the numbers instead of addressing challenges openly.

Benchmarks are helpful, but only when applied thoughtfully. They’re most effective at the portfolio or organizational level rather than as performance targets for individual teams. Teams differ significantly in architecture, complexity, support workload, and business focus. Comparing an infrastructure-heavy team to a greenfield product team isn’t practical or fair.

Use benchmarks to understand patterns, not to assign grades. Ask instead: “Is this team improving against their baseline? What’s helping or getting in the way?”

How to Use Team Metrics Without Breaking Trust or the System

1. Start by inviting teams into the process

  • Don’t tell them, “Flow Efficiency must go up 10%.”
  • Ask instead: “Here’s what the data shows. What’s behind this? What could we try?”

Why: Positive intent. Teams already want to improve. They’ll take ownership if you bring them into the process and give them time and space to act. Top-down mandates might push short-term results, but they usually kill long-term improvement.

2. Understand inputs vs. outputs

  • Output metrics (like Flow Time, PR throughput, or change failure rate) are results. You don’t control them directly.
  • Input metrics (like review turnaround time or number of unplanned interruptions) reflect behaviors teams can change.

Why: If you set targets on outputs, teams won’t know what to do. That’s when you get gaming or frustration. Input metrics give teams something they can improve. That’s how you get real system-level change.

I’ve been saying this for a while, and I like how Abi Noda and the DX team explain it: input vs. output metrics. It’s the same thing as leading vs. lagging indicators. Focus on what teams can influence, not just what you want to see improve.

3. Don’t turn metrics into targets

When a measure becomes a target, it stops being useful.

  • Don’t turn system health metrics into KPIs.
  • If people feel judged by a number, they’ll focus on making the number look good instead of fixing the system.

Why: You’ll get shallow progress, not real change. And you won’t know the difference because the data will look better. The cost? Lost trust, lower morale, and bad decisions.

4. Always add context

  • Depending on the situation, a 10-day Flow Time might be great or terrible.
  • Ask about the team’s product, the architecture, the kind of work they do, and how much unplanned work they handle.

Why: Numbers without context are misleading. They don’t tell the story. If you act on them without understanding what’s behind them, you’ll create the wrong incentives and fix the bad things.

5. Set targets the right way

  • Not every metric needs a goal.
  • Some should trend up; others should stay stable.
  • Don’t use blanket rules like “improve everything by 10%.”

Why: Metrics behave differently. Some take months to move. Others can be gamed easily. Think about what makes sense for that metric in that context. Real improvement takes time; chasing the wrong number can do more harm than good.

6. Tie metrics back to outcomes and the business

  • Don’t just say, “Flow Efficiency improved.” Ask, what changed?
    • Did we deliver faster?
    • Did we reduce the cost of delay?
    • Did we create customer value?

If you’ve read my other posts, I recommend tying every epic and initiative to an anticipated outcome. That mindset also applies to metrics. Don’t just look at the number. Ask what value it represents.

Also, it’s critical that teams use metrics to identify their bottlenecks. That’s the key. Real flow improvement comes from fixing the most significant constraint. If you’re improving something upstream or downstream of the bottleneck, you’re not improving flow. You’re just making things look better in one part of the system. It’s localized and often a wasted effort.

Why: If the goal is better business outcomes, you must connect what the team does with how it moves the needle. Metrics are just the starting point for that conversation.

7. Don’t track too many things

  • Stick to 3-5 input metrics at a time.
  • Make these part of retrospectives, not just leadership dashboards.

Why: Focus drives improvement. If everything is a priority, nothing is. Too many metrics dilute the team’s energy. Let them pick the right ones and go deep.

8. Build a feedback loop that works

  • Metrics are most useful when teams review them regularly.
  • Make time to reflect and adapt.

We’re still experimenting with what cadence works best. Right now, monthly retrospectives are the minimum. That gives teams short feedback loops to adjust their improvement efforts. A quarterly check-in is still helpful for zooming out. Both are valuable. We’re testing these cycles, but they give teams enough time to try, reflect, and adapt.

Why: Improvement requires learning. Dashboards don’t improve teams. Feedback does. Create a rhythm where teams can test ideas, measure progress, and shift direction.

A Word of Caution About Using Metrics for Performance Reviews

Some leaders ask, “Can I use Flow Metrics to evaluate my engineering managers?” You can, but it’s risky.

Flow Metrics tell you how the system is performing. They’re not designed to evaluate individuals. If you tie them to bonuses or promotions, you’ll likely get:

  • Teams gaming the data
  • Managers focus on optics, not problems
  • Reduced trust and openness

Why: When you make metrics part of a performance review, people stop using them for improvement. They stop learning. They play it safe. That hurts the team and the system.

Here’s what you can do instead:

In manager-led models, Engineering Managers are typically held accountable for team delivery. In cross-functional models, Agile Delivery Managers help guide improvement but don’t directly own delivery outcomes. In either case, someone helps the team improve.

That role should be evaluated, but not based on the raw numbers alone. Instead, assess how they supported improvement.

Thoughts on assessing “Guiding Team Improvement”:

Bottleneck Identification

  • Did they help surface and clarify constraints?
  • Are bottlenecks discussed and addressed

Team-Led Problem Solving

  • Did they enable experiments and reflection, not dictate fixes?

Use of Metrics for Insight, Not Pressure

  • Did they foster learning and transparency?

Facilitation of Improvement Over Time

  • Do the trends show intentional learning?

Cross-Team Alignment and Issue Escalation

  • Are they surfacing systemic issues beyond their team?

Focus on influence, not control. Assess those accountable to direct team performance improvements based on how they influence system improvements and support their teams.

  • Use metrics to guide coaching conversations, not to judge.
  • Evaluate managers based on how they improve the system and support their teams.
  • Reward experimentation, transparency, and alignment to business value.

Performance is bigger than one number. Metrics help tell the story, but they aren’t the story.

Sidebar: What if Gamification Still Improves the Metric?

I’ve heard some folks say, “I’m okay with gamification. If the number gets better, the team’s getting better.” That logic might work in the short term but breaks down over time. Here’s why:

  1. It often hides real issues.
  2. It focuses on optics instead of outcomes.
  3. It breaks feedback loops that drive learning.
  4. It leads to local, not systemic, improvement.

So, while gamification might improve the score, it doesn’t constantly improve the system and seldom as efficiently or sustainably.

If the goal is long-term performance, trust the process. Let teams learn from the data. Don’t let the number become the mission.

Metrics are just tools. If you treat them like a scoreboard, you’ll create fear. If you treat them like a flashlight, they’ll help you and your teams see what’s happening.

Don’t use metrics to judge individuals. Use them to guide conversations and, surface problems, and support improvement. That’s how you build trust and better systems.


Poking Holes

I invite your perspective on my posts. What are your thoughts?.

Let’s talk: phil.clark@rethinkyourunderstanding.com

Filed Under: Agile, DevOps, Leadership, Lean, Metrics, Product Delivery, Software Engineering, Value Stream Management

A Self-Guided Performance Assessment for Agile Delivery Teams

May 3, 2025 by philc

12 min read

This all started with a conversation and a question: “We do performance reviews for individuals, but what about teams?” If we care about how individuals perform, shouldn’t we also care about how teams perform together?

Why do we even work in teams?

It’s a strategic decision. In modern software delivery, teams are the core drivers of value. A strong team can achieve results far greater than what individuals can accomplish alone. How well we think and work together as a team (collective intelligence) is more impactful than the individual Performance of its members. That’s why improving team effectiveness is so important.

But what does team effectiveness enable?

  • Execution: High-performing teams work faster and better meet customer needs. They focus on the right priorities, adjust quickly, and recover more quickly when problems arise.
  • Engagement and Retention: People stay in workplaces where they feel their contributions matter, where they’re supported, and where they feel safe to share ideas. Strong teams build this kind of environment.
  • Sustainable Performance: Burnout occurs when individuals take on too much on their own. Strong teams share the workload, support one another, and collaborate to solve problems.

Many organizations still evaluate individuals on their own, individual performance assessments, overlooking their Performance within the team, their contributions, and the overall dynamics and health of the team.

So, let’s ask a better question: How well does your team work together?

  • What strengths and skills is the team using?
  • Which areas need more development or clarification?
  • How often does your team take time to review its Performance together?  
  • Do you have a system in place for gathering feedback and implementing ongoing improvements?

Just like individuals, even high-performing teams experience slumps or periods of lower Performance. Acknowledging this is the first step toward helping the team return to excellence.

This article provides a self-assessment tool to help teams evaluate their current working practices at a specific point in time. The goal isn’t to place blame or measure productivity but to spark open conversations and create clarity that leads to improvement. When teams get feedback on Performance and collaborate effectively, everything improves: delivery speed, developer satisfaction, and overall business impact.

A Reflection More Than a Framework

This isn’t a manager’s tool or a leadership scorecard. It’s a guide for teams looking to improve how they collaborate with purpose. It’s for delivery teams that value their habits just as much as their results.

Use it as a retro exercise. A quarterly reset. A mirror.

Why Team Reflection Matters

We already measure delivery performance. DORA. Flow. Developer Experience.
But those metrics don’t always answer:

  • Are we doing what we said mattered , like observability and test coverage?
  • Are we working as a team or as individuals executing in parallel?
  • Do we hold each other accountable for delivering with integrity?

This is the gap: how teams work together. This guide helps fill it , not to replace metrics, but to deepen the story they tell.

What This Is (And Isn’t)

You might ask: “Don’t SAFe, SPACE, DORA, or Flow Metrics already do this?”
Yes and no. Those frameworks are valuable. But they answer different questions:

  • DORA & Flow: How fast and stable is our delivery?
  • DX Core 4 & SPACE: How do developers feel about their work environment?
  • Maturity Models: How fully have we adopted Agile practices?
  • For organizations implementing SAFe, SAFe’s Measure and Grow evaluate enterprise agility in dimensions such as team agility, product delivery, and lean leadership.

What they don’t always show is:

  • Are we skipping discipline under pressure?
  • Do we collaborate across roles or operate in silos?
  • Are we shipping through red builds and hoping for the best?

But the question stuck with me:
Shouldn’t we do the same for teams if we hold individuals accountable for how they show up?

What follows is a framework and a conversation starter, not a mandate. It’s just something to consider because, in many organizations, teams are where the real impact (or dysfunction) lives.

Suggested Team Reflection Dimensions

You don’t need to use all twelve categories. Start with the ones that matter most to your team, or define your own. This section is designed to help teams reflect on how they work together, not just what they deliver.

But before diving into individual dimensions, start with this simple but powerful check-in.

Would We Consider Ourselves Underperforming, Performing, or High-Performing?

This question encourages self-awareness without any external judgment. The team should decide together: no scorecards, no leadership evaluations, just a shared reflection on your experience as a delivery team.

From there, explore:

  • What makes us feel that way?
    What behaviors, habits, or examples support our self-assessment?
  • What should we keep doing?
    What’s already working well that we want to protect or double down on?
  • What should we stop doing?
    What’s causing friction, waste, or misalignment?
  • What should we start doing?
    What’s missing that could improve how we operate?
  • Do we have the skills and knowledge needed to meet our work demands?

This discussion often surfaces more actionable insight than metrics alone. It grounds the assessment in the team’s shared experience and sets the tone for improvement, not judgment.

A Flexible Self-Evaluation Scorecard

While this isn’t designed as a top-down performance tool, teams can use it as a self-evaluation scorecard if they choose. The reflection tables that follow can help teams:

  • Identify where they align today: underperforming, performing, or high-performing.
  • Recognize the dimensions where they accelerate and where they have room to improve.
  • Prioritize the changes that will have the greatest impact on how they deliver.

No two teams will see the same patterns, and that’s the point. Use the guidance below not as a measurement of worth but as a compass to help your team navigate toward better outcomes together.

10-Dimension Agile Team Performance Assessment Framework

These dimensions serve as valuable tools for self-assessments, retrospectives, or leadership reviews, offering a framework to evaluate not just what teams deliver, but how effectively they perform.

  1. Execution & Ownership: Do we plan realistically, adapt when needed, and take shared responsibility for outcomes?
  2. Collaboration & Communication: Do we collaborate openly, communicate effectively, and stay aligned across roles?
  3. Flow & Efficiency: Is our work moving steadily through the system with minimal delays or waste?
  4. Code Quality & Engineering Practices: Do we apply consistent technical practices that support high-quality, sustainable code?
  5. Operational Readiness & Observability: Are we ready to monitor, support, and improve the solutions we deliver?
  6. Customer & Outcome Focus: Do we understand who we’re building for and how our work delivers real-world value?
  7. Role Clarity & Decision Making: Are roles well understood, and do we share decisions appropriately across the team?
  8. Capabilities & Growth: Do we have the skills to succeed, and are we growing individually and as a team?
  9. Data-Driven Improvement: Do we use metrics, retrospectives, and feedback to improve how we work?
  10. Business-Technical Integration: Do we balance delivery of business and customer value with investment in technical health?

These dimensions help teams focus not just on what they’re delivering but also on how their work contributes to long-term success.

Reflection Table

This sample table is a great way to start conversations. It works well for retrospectives, quarterly check-ins, or when something feels off. Each category includes a key question and signs that may indicate your team is facing challenges in that area. These can be used as a team survey as well.

Execution & Ownership
Reflection Prompts: Do we plan realistically and follow through on what we commit to? Are we updating estimates and plans as new information emerges? Do we raise blockers or risks early? Are we collectively responsible for outcomes?
Signs of Struggle: Missed or overly optimistic goals, reactive work, unclear priorities or progress, estimates are outdated or disconnected from reality, team blames others or avoids accountability when things go wrong.

Collaboration & Communication
Reflection Prompts: Do we communicate openly, show up for team events, and work well across roles? How do we share knowledge and maintain alignment?
Signs of Struggle: Silos, missed handoffs, unclear ownership, frequent miscommunication.

Flow & Efficiency
Reflection Prompts: How efficiently does work move through our system? Are we managing context switching, controlling work in progress, and minimizing delays or rework?
Signs of Struggle: Ignored bottlenecks, context switching, stale or stuck work.

Code Quality & Engineering Practices
Reflection Prompts: Do we value quality in every commit? Are testing, automation, and clean code part of our culture? Do we apply consistent practices to ensure high-quality, maintainable code?
Signs of Struggle: Bugs, manual processes, high rework, tech debt increasing.

Operational Readiness & Observability
Reflection Prompts: Can we detect, troubleshoot, and respond to issues quickly and confidently?
Signs of Struggle: No monitoring, poor alerting, users report issues before we know.

Customer & Outcome Focus
Reflection Prompts: Do we understand the “why” behind our work (the anticipated outcome)? Do we measure whether we’re delivering impact and not just features?
Signs of Struggle:
Misaligned features, lack of outcome tracking, limited feedback loops.

Role Clarity & Decision Making
Reflection Prompts: Are team roles clear to everyone on the team? Do we share decision-making across product, tech, and delivery?
Signs of Struggle: Conflicting priorities, top-down decision dominance, slow resolution.

Capabilities & Growth
Reflection Prompts: Do we have the right skills to succeed and time to improve them? Do we have the capabilities required to deliver work?
Signs of Struggle: Skill gaps, training needs ignored, dependence on specialists or other teams.

Data-Driven Improvement
Reflection Prompts: Do we use metrics, retrospectives, and feedback to improve how we work?
Signs of Struggle: Metrics ignored, retros lack follow-through, repetitive problems.

Accountability & Ownership
Reflection Prompts: Can we be counted on? Do we raise risks and take responsibility as a team? Do we take shared responsibility for our delivery and raise risks early?
Signs of Struggle: Missed deadlines, hidden blockers, avoidance of tough conversations.

Business-Technical Integration
Reflection Prompts: Are we balancing product delivery with long-term technical health and business needs?
Signs of Struggle: Short-term thinking, ignored tech debt, disconnected roadmap and architecture.

How this appears in table format:

DimensionReflection PromptsSigns of Struggle
1. Execution & OwnershipDo we plan realistically and follow through on what we commit to? Are we updating estimates and plans as new information emerges? Do we raise blockers or risks early? Are we collectively responsible for outcomes?Missed or overly optimistic goals, reactive work, unclear priorities or progress, estimates are outdated or disconnected from reality, team blames others or avoids accountability when things go wrong.
2. Collaboration & CommunicationDo we communicate openly, show up for team events, and work well across roles? How do we share knowledge and maintain alignment?Silos, missed handoffs, unclear ownership, frequent miscommunication.
3. Flow & EfficiencyHow efficiently does work move through our system? Are we managing context switching, controlling work in progress, and minimizing delays or rework?Ignored bottlenecks, context switching, stale or stuck work.
4. Code Quality & Engineering PracticesDo we value quality in every commit? Are testing, automation, and clean code part of our culture? Do we apply consistent practices to ensure high-quality, maintainable code?Bugs, manual processes, high rework, tech debt increasing.
5. Operational Readiness & ObservabilityCan we detect, troubleshoot, and respond to issues quickly and confidently?No monitoring, poor alerting, users report issues before we know.
6. Customer & Outcome FocusDo we understand the “why” behind our work (the anticipated outcome)? Do we measure whether we’re delivering impact and not just features?Misaligned features, lack of outcome tracking, limited feedback loops.
7. Role Clarity & Decision MakingAre roles clear? Do we share decision-making across product, tech, and delivery?Conflicting priorities, top-down decision dominance, slow resolution.
8. Capabilities & GrowthDo we have the right skills to succeed and time to improve them? Do we have the capabilities required to deliver work?Skill gaps, training needs ignored, dependence on specialists or other teams.
9. Data-Driven ImprovementDo we use metrics, retrospectives, and feedback to improve how we work?Metrics ignored, retros lack follow-through, repetitive problems.
10. Business-Technical IntegrationAre we balancing product delivery with long-term technical health and business needs?Short-term thinking, ignored tech debt, disconnected roadmap and architecture.

Detailed Assessment Reference

For teams looking for assessment levels, the next section breaks down each reflection category. It explains what “Not Meeting Expectations,” “Meeting Expectations,” and “Exceeding Expectations” look like in practice.

Execution & Ownership
Do we plan realistically, adapt when needed, and take shared responsibility for outcomes?

  • Not Meeting Expectations:
    No planning rhythm; commitments are missed; estimates are rarely updated; blockers are hidden.
  • Meeting Expectations:
    Team plans regularly, meets most commitments, revises estimates as needed, and raises blockers transparently.
  • Exceeding Expectations:
    Plans adapt with agility; estimates are realistic and actively managed; the team owns outcomes and proactively addresses risks.

Collaboration & Communication
Do we collaborate openly, communicate effectively, and stay aligned across roles?

  • Not Meeting Expectations: Works in silos; communication is inconsistent or unclear; knowledge isn’t shared. Team members are not attending meetings or conversations regularly.
  • Meeting Expectations: Team collaborates effectively and communicates openly across roles.
  • Exceeding Expectations: Team creates shared clarity, collaborates regularly, and actively drives alignment across all functions.

Flow & Efficiency
Is our work moving steadily through the system with minimal delays or waste?

  • Not Meeting Expectations: Work is consistently blocked or stuck; high WIP and frequent context switching slow delivery.
  • Meeting Expectations: Team manages WIP, removes blockers, and maintains steady delivery flow.
  • Exceeding Expectations: Team actively optimizes flow end-to-end; bottlenecks are identified and resolved.

Code Quality & Engineering Practices
Do we apply consistent technical practices that support high-quality, sustainable code?

  • Not Meeting Expectations: Defects are frequent; automation, testing, and refactoring are lacking.
  • Meeting Expectations: Defects are few or less frequent; code reviews and testing are standard; quality practices are regularly applied.
  • Exceeding Expectations: Quality is a shared team value; clean code, automation, and sustainable practices are embedded.

Operational Readiness & Observability
Are we ready to monitor, support, and improve the solutions we deliver?

  • Not Meeting Expectations: Monitoring is missing or insufficient; issues are discovered by users.
  • Meeting Expectations: Alerts and monitoring are in place; team learns from post-incident reviews.
  • Exceeding Expectations: Observability is proactive; issues are detected early and inform ongoing improvements.

Customer & Outcome Focus
Do we understand who we’re building for and how our work delivers real-world value?

  • Not Meeting Expectations: Work is disconnected from business goals; outcomes are not communicated or measured.
  • Meeting Expectations: Team understands customer or business impact and loosely ties delivery to anticipated outcomes and value.
  • Exceeding Expectations: Business or Customer impact drives planning and iteration; outcomes are tracked and acted upon.

Role Clarity & Decision Making
Are roles well understood, and do we share decisions appropriately across the team?

  • Not Meeting Expectations: Decision-making is top-down or unclear; prioritization is top-down; roles are overlapping or siloed.
  • Meeting Expectations: Team members understand their roles, prioritize, and make decisions collaboratively.
  • Exceeding Expectations: Teams co-own prioritization and decisions with transparency, clear tradeoffs, and joint accountability.

Capabilities & Growth
Do we have the skills to succeed, and are we growing individually and as a team?

  • Not Meeting Expectations: Skill gaps persist; team lacks growth opportunities or training support.
  • Meeting Expectations: The team has the right skills for current work and seeks help when needed.
  • Exceeding Expectations: Team proactively builds new capabilities, shares knowledge, and adapts to new challenges.

Data-Driven Improvement
Do we use metrics, retrospectives, and feedback to improve how we work?

  • Not Meeting Expectations: Feedback is anecdotal; metrics are not understood or ignored or unused in retrospectives.
  • Meeting Expectations: Team uses metrics and feedback to inform improvements regularly.
  • Exceeding Expectations: Metrics drive learning, experimentation, and meaningful change.

Business-Technical Integration
Do we balance delivery of business and customer value with investment in technical health?

  • Not Meeting Expectations: Technical health is ignored or sidelined in favor of speed and features.
  • Meeting Expectations: Product and engineering collaborate on both business value and technical needs.
  • Exceeding Expectations: Long-term technical health and business alignment are integrated into delivery decisions.

How this appears in table format:

10-Dimension Agile Team Performance Assessment Framework (3-Point Scale)

DimensionNot Meeting ExpectationsMeeting ExpectationsExceeding Expectations
1. Execution & OwnershipNo planning rhythm; missed commitments; outdated estimates; blockers hidden.Regular planning; estimates revised; blockers raised transparently.Plans adapt with agility; estimates are managed; team owns outcomes and addresses risks proactively.
2. Collaboration & CommunicationSiloed work; unclear communication; knowledge hoarded.Open, cross-role communication; knowledge shared.Team drives shared clarity and proactive alignment with others.
3. Flow & EfficiencyWork stalls; high WIP; frequent context switching.Steady flow; WIP managed; blockers removed.Flow optimized across the system; bottlenecks surfaced and resolved quickly.
4. Code Quality & EngineeringFrequent defects; minimal automation; unmanaged tech debt.Testing and reviews in place; debt tracked.Clean, sustainable code is a team norm; quality and automation prioritized.
5. Operational ReadinessMonitoring lacking; users detect issues.Monitoring and alerting in place; incident reviews occur.Team detects issues early; observability drives proactive improvement.
6. Customer & Outcome FocusLittle connection to business value or user needs.Team aware of goals; some outcome alignment.Delivery prioritized around customer value; outcomes measured and iterated on.
7. Role Clarity & Decision MakingRoles unclear; top-down decisions.Roles defined; collaborative decision-making.Shared decision ownership; tradeoffs transparent and understood.
8. Capabilities & GrowthSkill gaps ignored; no focus on development.Right skills in place; asks for help when needed.Team proactively grows skills; cross-training and adaptability are norms.
9. Data-Driven ImprovementMetrics ignored; retros repetitive or shallow.Data and feedback used in team improvement.Metrics and feedback drive learning and meaningful change.
10. Business-Technical IntegrationTechnical health neglected; short-term focused.Business and tech needs discussed and planned.Business outcomes and technical resilience are co-prioritized in delivery.

The assessment is meant to start conversations. Use it as a guide, not a strict scoring system, and revisit them as your team grows and changes. High-performing teams regularly reflect as part of their routine, not just occasionally.

How to Use This and Who Should Be Involved

This framework isn’t only a performance review. It’s a reflection tool designed for teams to assess themselves, clarify their goals, and identify areas for growth.

Here’s how to make it work:

1. Run It as a Team

Use this framework during retrospectives, quarterly check-ins, or after a major delivery milestone. Let the team lead the conversation. They’re closest to the work and best equipped to evaluate how things feel.

The goal isn’t to assign grades. It’s to pause, align, and ask: How are we doing?

2. Make It Yours

There’s no need to use all ten dimensions. Start with the ones that resonate most. You can rename them, add new ones, or redefine what “exceeding expectations” look like in your context.

The more it reflects your team’s values and language, the more powerful the reflection becomes.

3. Use Metrics to Support the Story, Not Replace It

Delivery data like DORA, Flow Metrics, or Developer Experience scores can add perspective. But they should inform, not replace the conversation. Numbers are helpful, but they don’t speak for how it feels to deliver work together. Let data enrich the dialogue, not dictate it.

4. Invite Broader Perspectives

Some teams can gather anonymous 360° feedback from stakeholders or adjacent teams surfacing blind spots and validate internal perceptions.

Agile Coaches or Delivery Leads can also bring an outside-in view, helping the team see patterns over time, connecting the dots across metrics and behaviors, and guiding deeper reflection. Their role isn’t to evaluate but to support growth.

5. Let the Team Decide Where They Stand

As part of the assessment, ask the team:
Would we consider ourselves underperforming, performing, or high-performing?Then explore:

  • What makes us feel that way?
  • What should we keep doing?
  • What should we stop doing?
  • What should we start doing?

These questions give the framework meaning. It turns observation into insight and insight into action.

This Is About Ownership, Not Oversight

This reflection guide and its 10 dimensions can serve as a performance management tool, but I strongly recommend using it as a check-in resource for teams. It’s designed to build trust, encourage honest conversations, and offer a clear snapshot of the team’s current state. When used intentionally, it enhances team cohesion and strengthens overall capability. For leaders, focusing on recurring themes rather than individual scores reveals valuable patterns that can inform coaching efforts rather than impose control. Adopting it is in your hands and your team’s.

Final Thoughts

This all started with a conversation and a question: “We do performance reviews for individuals, but what about teams?” If we care about how individuals perform, shouldn’t we also care about how teams perform together?

High-performing teams don’t happen by accident. They succeed by focusing on both what they deliver and how they deliver it.

High-performing teams don’t just meet deadlines, they adapt, assess themselves, and improve together. This framework provides them with a starting point to make that happen.

I’ll create a Google Form with these dimensions, using a 3-point Likert scale for our teams to fill out.


Related Articles

If you found this helpful, here are a few related articles that explore the thinking behind this framework:

  • From Feature Factory to Purpose-Driven Development: Why Anticipated Outcomes Are Non-Negotiable
  • Decoding the Metrics Maze: How Platform Marketing Fuels Confusion Between SEI, VSM, and Metrics
  • Navigating the Digital Product Workflow Metrics Landscape: From DORA to Comprehensive Value Stream Management Platform Solutions

Poking Holes

I invite your perspective on my posts. What are your thoughts?.

Let’s talk: phil.clark@rethinkyourunderstanding.com

Filed Under: Agile, DevOps, Leadership, Metrics, Product Delivery, Software Engineering, Value Stream Management

  • « Go to Previous Page
  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to page 4
  • Interim pages omitted …
  • Go to page 8
  • Go to Next Page »

Copyright © 2025 · Rethink Your Understanding

  • Home
  • Mission
  • Collaboration
  • AI
  • Posts
  • Podcast
  • Endorsements
  • Resources
  • Contact