Back to blog Building Out Loud · Entry 09

Measuring the wrong thing

AI made shipping code fast. So why are we still measuring success by how much ships, and who gets lost in the gap?

· 7 min read

Today I had a conversation I can’t stop thinking about. The kind of conversation where you feel heard and you’re learning at the same time. I walked away from it feeling like I wanted to think about it more, so I’m writing it down. That’s the whole reason this blog exists.

Here’s what’s been rattling around.

When a lot of companies talk about AI right now, the success stories all sound the same. This much code shipped. This many products launched. A year of work done in two weeks. This many seats deployed. This many roles cut because we don’t need them anymore. The numbers are big and they’re going up and to the right, and everyone smiles and nods.

I keep waiting to hear a different conversation underneath that one, and I’m mostly not hearing it. I’ve seen it touched on here and there. But here and there isn’t enough for something this important, so I’m going to say it out loud, even if it’s just into my own little void.

The question I’m not hearing is: are we measuring success in the right way? And underneath that, a question that matters more to me: is anyone measuring the success of the humans, or are we just measuring what comes out of them? The code they produce.

let me be honest about where I’m standing

I am not an engineer. I didn’t go to school for it. I can’t sit down and write you a program from nothing. I’m good in a command line, I know my way around PowerShell, but if you took AI away from me and asked me to build you software, it would take me years of real training to get there.

And yet I can and do ship products. I do it because AI enables me. AI turns people like me, decent with tech, knows how to solve a problem, but can’t write code, into something I’ll call a quasi-engineer, or maybe wannabe-engineer. I can produce what an engineer produces without having earned what an engineer has had to earn to get there.

For me, that’s a gift. Ten times faster than zero is infinite. I’m not losing anything by using AI, I’m gaining a capability I never had.

But for an actual engineer, someone who went to school to write code, AI is a completely different animal. It’s not giving them a skill they lacked. It’s multiplying judgment they’ve already spent years building. The experience to know what good architecture looks like. The instinct to catch the subtle thing the model got confidently wrong. The experience to know when not to build. Same output as me, on the surface. Completely different perspective happening underneath.

A leadership team looking at an output dashboard cannot tell us apart. If the dashboard shows code shipped, products launched, velocity up. It’s not showing the difference between my output and a senior engineer’s output, because the difference isn’t in just the shipped code anymore. It’s in the judgment underneath it, and not enough people are tracking that.

the role changed but the measurement didn’t

If you were an engineer five years ago and you’re still one today, your job probably doesn’t look the same. You’re not spending your days hand-typing code. You’re not reading line by line hunting for the one character that broke the build. AI does a lot of that now. Your time goes to prompting, reviewing, specifying, deciding. The work moved up a level, from producing the code to directing and judging it along the way.

That’s a major shift. And most of the ways we measure engineering success were built for the old shape of the job. Story points, tickets closed, lines shipped, PRs merged. Those metrics reward exactly the thing AI just made fast and furious. The job changed but we’re still looking at success from the wrong direction.

So when leadership looks at an engineering team going ten times faster and calls it a win, I want to ask them: a win at what, ten times faster shipped code? Adoption? Laffy taffy?, and is anyone checking the things the measurements can’t see?

the considerations the dashboard misses

A few things sit in the gap between “we shipped a lot, fast” and “this was actually a success.”

The first one is practical, and it’s about whether what we built will survive. Code shipped today isn’t frozen in amber. Six months from now the model changes. Will the next model be able to read and reason about what this one wrote? Will it maintain it cleanly, or will it start adding bugs into something nobody on the team ever fully reasoned through themselves? When you ship a year of work in two weeks and no human deeply understands the guts of it, the meat and bones, you’ve also created something that might be very hard to maintain when the tool that built it shifts underneath you. That’s a real risk and it doesn’t show up anywhere on a velocity chart. The chart says you won. The dashboard shows you’re ahead and the maintenance bill arrives later.

The second one I’m going to raise carefully, because it’s not my world and I’d be overstepping to pretend I have the answer. Engineers used to become senior by writing the unglamorous code and getting it torn apart in review. That was the training ground. The boring work was how judgment got built. If AI writes that code now, where do junior engineers go to build the judgment that makes them into a senior worth listening to? I genuinely don’t know. I suspect someone closer to it than I am has thought about this harder, and I’d rather ask the question honestly than answer it here badly. But it seems like it matters, and it’s the kind of thing that’s invisible until it’s staring you in the face.

The third thing is the thing the whole post is really about. The humans. When the only thing you reward is more code, faster merges, you are quietly telling experienced people that the part of the job they spent years getting good at no longer counts as work worth their time, and the part that does count is volume anyone can seemingly now produce. That’s a fast road to burning out the exact people you most want to keep motivated and engaged. The ones you’ve invested time and money in. The tried and true ones who actually know things from experience. If the metric only sees the output, it can’t see a great engineer running on empty, and it can’t see the moment they decide this isn’t the job they signed up for anymore.

the actual question

So here’s where I land, and spoiler alert it’s not on an answer.

If shipping code fast is no longer a rare skill, because now nearly anyone can do it, then how much is it really worth as a measure of success? And if it’s worth less than we think, what should we be measuring instead?

I don’t have the new measurement on hand but I’m fairly sure that the companies measuring AI success purely by output are measuring the easy thing, not the true thing, and that the gap between those two is where a lot of good people are quietly going to get lost in this bubble.

And it’s worth saying that the conversation that sparked this was between two people, thinking out loud together. That wasn’t incidental or an accident. The whole thing I’m worried about losing is the human underneath the output. It feels right that the question came from a human exchange, not from a dashboard, and not from a tool.

I’d genuinely like to hear from people who are closer to this than I am. Especially the engineers whose jobs have changed and continue to shift. What does success actually look like now, from inside the seat?


find me on LinkedIn if the vibes feel right: linkedin.com/in/oliviakeiter

Signed,

Back to blog