Amazon blames human employees for an AI coding agent’s mistake | Two minor AWS outages have reportedly occurred as a result of actions by Amazon’s AI tools

Chris Remington@beehaw.org · edit-2 1 month ago

Amazon blames human employees for an AI coding agent’s mistake | Two minor AWS outages have reportedly occurred as a result of actions by Amazon’s AI tools

Limerance@piefed.social · 1 month ago

Realistically what happens is the code review is done under time pressure and not very thoroughly.

TehPers@beehaw.org · 1 month ago

This is what happens to us. People put out a high volume of AI-generated PRs, nobody has time to review them, and the code becomes an amalgamation of mixed paradigms, dependency spaghetti, and partially tested (and horribly tested) code.

Also, the people putting out the AI-generated PRs are the same people rubber stamping the other PRs, which means PRs merge quickly, but nobody actually does a review.

The code is a mess.

heluecht@pirati.ca · 1 month ago

@TehPers @Limerance why hadn’t you time to review it? Every minute in review pays off because it saves you from hours of debugging and handling with angry customers.

TehPers@beehaw.org · 1 month ago

Because if I spent my whole day reviewing AI-generated PRs and walking through the codebase with them only for the next PR to be AI-generated unreviewed shit again, I’d never get my job done.

I’d love to help people learn, but nobody will use anything they learn because they’re just going to ask an LLM to do their task for them anyway.

This is a people problem, and primarily at a high level. The incentive is to churn out slop rather than do things right, so that’s what people do.

heluecht@pirati.ca · 1 month ago

Who creates these AI-generated PRs? Colleagues in your company or some third party people in an open source project? I guess when this would happen to me, and the PRs were created by colleagues, then I would escalate this to my line manager. And I guess that she would understand why this is a problem and why it had to stop.

TehPers@beehaw.org · edit-2 1 month ago

Colleagues, and the issue is top-down. I’ve raised it as an issue already. My manager can’t do anything about it.

Limerance@piefed.social · edit-2 1 month ago

Sure, that’s the theory. In practice code review often looks like this:

a quick glance to see if the code plausibly does what it claims for longer patches
A long argument about some stylistic choice for short patches

In other words – people were barely reading merge requests before. Code reviews have limited effects as well. You won’t catch all bugs or see if it actually works just by looking at the code. Code reviews mainly serve to spread knowledge about the code among the team. The more code exists in a project, the harder it is to understand. You don’t want huge areas of code, that only one person has ever seen.

Project managers don’t necessarily talk to angry customers directly. They might also choose to chase more features instead of allocating resources to fixing bugs. It depends on what the bosses prioritize. If they want AI and lots of new features, that‘s what they will get. Fixing bugs, improved stability, better performance, etc. are rarely the priority.

heluecht@pirati.ca · edit-2 1 month ago

@Limerance Well, on Friday I spent around 1.5 hours just reviewing a single PR. And I’m not done. I will have to continue my work on it on Monday. Reviewing in our company means understanding the connected use case, then having a look if the coding does what the use case defined. Also we look if the coding is done according to our internal style guide. Since our review is normally done by at least two people, (at most of our apps two people have to accept the PR until it can be merged) one person will see what the other missed. And we often talk about what the other missed, so that we learn.

Concerning angry customers: Our apps are used by several ten thousand users. And although our group doesn’t have direct customer contact, we get the bug reports and have to fix them anyway or we have to support the teams who directly work with the customers.

And I just realize that I’m in a very lucky situation. In our company each use case is tested thoroughly by the responsible QA and PO. And for each use case we write half a dozen (or more) test functions that check the functionality. Normally coding the tests takes more time then coding the use case itself.

Our company is very AI driven, but on the same hand we hear in the regular town halls about the customer satisfaction. And the goal there is to increase it steadily. Our customers are companies, so maybe there’s the difference.

TehPers@beehaw.org · 1 month ago

To put some perspective into what our code looks like, there are very few tests (which may or may not pass), no formatter or linter for most of the code, no pipelines to block PRs, no gates whatsoever on PRs, and the code is somewhat typed sometimes (the Python, anyway). Our infrastructure was created ad-hoc, it’s not reproducible, there’s only one environment shared between dev and prod, etc.

I’ve been in multiple meetings with coworkers and my manager talking about how it is embarassing that this is what we’re shipping. For context, I haven’t been on this project for very long, but multiple projects we’re working on are like this.

Two years ago, this would have been unacceptable. Our team has worked on and shipped products used by millions of people. Today the management is just chasing the hype, and we can barely get one customer to stay with us.

The issue lies with the priorities from the top down. They want new stuff. They don’t care if it works, how maintainable it is, or even what the cost is. All they care about is “AI this” and “look at our velocity” and so on. Nobody cares if they’re shipping something that works, or even shipping the right thing.

heluecht@pirati.ca · 1 month ago

@TehPers That sounds like a very bad situation!