Excerpt:
“Even within the coding, it’s not working well,” said Smiley. “I’ll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven’t engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence.”
Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.
“We don’t know what those are yet,” he said.
One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That’s the kind of thing that needs to be assessed to determine whether AI helps an organization’s engineering practice.
To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.
“It passed all the unit tests, the shape of the code looks right,” he said. It’s 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It’s a dumpster fire. Throw it away. All that money you spent on it is worthless."
All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.
“Coding works if you measure lines of code and pull requests,” he said. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”



When I entered the workforce in the late '90s, people were still saying this about putting PCs on every employee’s desk. This was at a really profitable company. The argument was they already had telephones, pen and paper. If someone needed to write something down, they had secretaries for that who had typewriters. They had dictating machines. And Xerox machines.
And the truth was, most of the higher level employees were surely still more profitable on the phone with a client than they were sitting there pecking away at a keyboard.
Then, just a handful of years later, not only would the company have been toast had it not pushed ahead, but was also deploying BlackBerry devices with email, deploying laptops with remote access capabilities to most staff, and handheld PDAs (Palm pilots) to many others.
Looking at the history of all of this, sometimes we don’t know what exactly will happen with newish tech, or exactly how it will be used. But it’s true that the companies that don’t keep up often fall hopelessly behind.
If AI is so good at what it does, then it shouldn’t matter if you fall behind in adopting it… it should be able to pick up from where you need it. And if it’s not mature, there’s an equally valid argument to be made for not even STARTING adoption until it IS - early adopters always pay the most.
There’s practically no situation where rushing now makes sense, even if the tech eventually DOES deliver on the promise.