Amazon blames human employees for an AI coding agent’s mistake | Two minor AWS outages have reportedly occurred as a result of actions by Amazon’s AI tools

Chris Remington@beehaw.org · edit-2 2 days ago

Amazon blames human employees for an AI coding agent’s mistake | Two minor AWS outages have reportedly occurred as a result of actions by Amazon’s AI tools

JakenVeina@midwest.social · 18 hours ago

That’s actually the correct approach, they just got the wrong humans. It’s the executives that are FORCING developers to use AI as a metric that are responsible here, not the developers themselves.

PenguinCoder@beehaw.org · edit-2 1 day ago

A computer (AI) can never be held accountable. Therefore, a computer (AI) must never make any human decision.

Megaman_EXE@beehaw.org · 1 day ago

It blows my mind that people are going forward with this AI nonsense and that it has infected key infrastructure. I feel like im taking crazy pills here. I could kind of understand if it actually worked. Like if it genuinely worked as well as they said? I could totally understand it. I would still question it, but it would make more sense.

XLE@piefed.social · 2 days ago

If a person is going to be blamed, it should be the one that mandated use of the AI systems… Because that’s exactly what Amazon was doing.

Soulphite@reddthat.com · 2 days ago

Talk about an extra slap in the fuckin face… getting blamed for something your replacement did. Cool.

Tharkys@lemmy.wtf · 2 days ago

That’s in the SOP for management.

Soulphite@reddthat.com · 2 days ago

True. In this case, these poor saps being tricked into “training” these AI to eventually render their jobs obsolete.

pinball_wizard@lemmy.zip · 2 days ago

Yes. “obsolete” in that Amazon doesn’t give a shit about reliability anymore, so an AI reliability engineer is fine, now. Haha.

kibiz0r@midwest.social · 2 days ago

This is a terrible idea for Amazon, the cloud services company.

But for Amazon, the AI company? This is them illustrating the new grift that almost any company can do: use AI to keep a plausible mirage of your company going while reducing opex, and sacrifice humans when necessary to dodge accountability.

But his job wasn’t even to supervise the chatbot adequately (single-handedly fact-checking 10 lists of 15 items is a long, labor-intensive process). Rather, it was to take the blame for the factual inaccuracies in those lists. He was, in the phrasing of Dan Davies, “an accountability sink” (or as Madeleine Clare Elish puts it, a “moral crumple zone”).

https://locusmag.com/feature/commentary-cory-doctorow-reverse-centaurs/

frustrated_phagocytosis@fedia.io · 2 days ago

Would said employees have voluntarily used the agent if Amazon didn’t demand it? If no, this isn’t on them. They shouldn’t be responsible for forced use of unvetted tools.

pinball_wizard@lemmy.zip · 2 days ago

described the outages as “small but entirely foreseeable.”

LMAO

melroy@kbin.melroy.org · 2 days ago

AI is working great!

LurkingLuddite@piefed.social · 2 days ago

It’s working great to convince moronic executives to leave Windows when it fucks up majorly due to AI coding, which is a win for everyone.

Powderhorn@beehaw.org · 2 days ago

I mean, I’ll applaud any push toward Linux.

MNByChoice@midwest.social · 2 days ago

Yay! Extra mental load of having to ask the AI “correctly” and then keep up one’s skills to be able to review the AI’s work! Extra bonus for being blamed for letting anything slip past.

At least the junior that fucked up will learn something from the experience and can buy a round of beers (if the junior is paid well enough, otherwise the seniors have to buy the junior a beer while talking it out).

Powderhorn@beehaw.org · 2 days ago

I’m reminded of a time I was in a bar in Georgia at a conference. It was in the hotel, and a high-ranking editor for the then-reputable Washington Post bought me a beer. He let me take a sip before launching into how much “immature shit [I] need to get out of [my] system” before being ready to be “Post material.”

Where is any industry going to be in a decade, when no one’s been mentored?

AllNewTypeFace@leminal.space · 2 days ago

AI can never fail, it can only be failed

Petter1@discuss.tchncs.de · 2 days ago

Well, AI code should be reviewed prior merge into master, same as any code merged into master.

We have git for a reason.

So I would definitely say this was a human fault, either reviewer’s or the human’s who decided that no (or AI driven) review process is needed.

If I would manage devOps, I would demand that AI code has to be signed off by a human on commit taking responsibility with the intention that they review changes made by AI prior pushing

pinball_wizard@lemmy.zip · edit-2 2 days ago

If I would manage devOps, I would demand that AI code has to be signed off by a human on commit taking responsibility with the intention that they review changes made by AI prior pushing

And you would get burned. Today’s AI does one thing really really well - create output that looks correct to humans.

You are correct that mandatory review is our best hope.

Unfortunately, the studies are showing we’re fucked anyway.

Because whether the AI output is right or wrong, it is highly likely to at least look correct, because creating correct looking output is where (what we call “AI”, today) AI shines.

Limerance@piefed.social · 2 days ago

Realistically what happens is the code review is done under time pressure and not very thoroughly.

TehPers@beehaw.org · 1 day ago

This is what happens to us. People put out a high volume of AI-generated PRs, nobody has time to review them, and the code becomes an amalgamation of mixed paradigms, dependency spaghetti, and partially tested (and horribly tested) code.

Also, the people putting out the AI-generated PRs are the same people rubber stamping the other PRs, which means PRs merge quickly, but nobody actually does a review.

The code is a mess.

heluecht@pirati.ca · 1 day ago

@TehPers @Limerance why hadn’t you time to review it? Every minute in review pays off because it saves you from hours of debugging and handling with angry customers.

TehPers@beehaw.org · 1 day ago

Because if I spent my whole day reviewing AI-generated PRs and walking through the codebase with them only for the next PR to be AI-generated unreviewed shit again, I’d never get my job done.

I’d love to help people learn, but nobody will use anything they learn because they’re just going to ask an LLM to do their task for them anyway.

This is a people problem, and primarily at a high level. The incentive is to churn out slop rather than do things right, so that’s what people do.

heluecht@pirati.ca · 18 hours ago

Who creates these AI-generated PRs? Colleagues in your company or some third party people in an open source project? I guess when this would happen to me, and the PRs were created by colleagues, then I would escalate this to my line manager. And I guess that she would understand why this is a problem and why it had to stop.

TehPers@beehaw.org · edit-2 16 hours ago

Colleagues, and the issue is top-down. I’ve raised it as an issue already. My manager can’t do anything about it.

Limerance@piefed.social · edit-2 20 hours ago

Sure, that’s the theory. In practice code review often looks like this:

a quick glance to see if the code plausibly does what it claims for longer patches
A long argument about some stylistic choice for short patches

In other words – people were barely reading merge requests before. Code reviews have limited effects as well. You won’t catch all bugs or see if it actually works just by looking at the code. Code reviews mainly serve to spread knowledge about the code among the team. The more code exists in a project, the harder it is to understand. You don’t want huge areas of code, that only one person has ever seen.

Project managers don’t necessarily talk to angry customers directly. They might also choose to chase more features instead of allocating resources to fixing bugs. It depends on what the bosses prioritize. If they want AI and lots of new features, that‘s what they will get. Fixing bugs, improved stability, better performance, etc. are rarely the priority.

heluecht@pirati.ca · edit-2 10 hours ago

@Limerance Well, on Friday I spent around 1.5 hours just reviewing a single PR. And I’m not done. I will have to continue my work on it on Monday. Reviewing in our company means understanding the connected use case, then having a look if the coding does what the use case defined. Also we look if the coding is done according to our internal style guide. Since our review is normally done by at least two people, (at most of our apps two people have to accept the PR until it can be merged) one person will see what the other missed. And we often talk about what the other missed, so that we learn.

Concerning angry customers: Our apps are used by several ten thousand users. And although our group doesn’t have direct customer contact, we get the bug reports and have to fix them anyway or we have to support the teams who directly work with the customers.

And I just realize that I’m in a very lucky situation. In our company each use case is tested thoroughly by the responsible QA and PO. And for each use case we write half a dozen (or more) test functions that check the functionality. Normally coding the tests takes more time then coding the use case itself.

Our company is very AI driven, but on the same hand we hear in the regular town halls about the customer satisfaction. And the goal there is to increase it steadily. Our customers are companies, so maybe there’s the difference.

TehPers@beehaw.org · 12 hours ago

To put some perspective into what our code looks like, there are very few tests (which may or may not pass), no formatter or linter for most of the code, no pipelines to block PRs, no gates whatsoever on PRs, and the code is somewhat typed sometimes (the Python, anyway). Our infrastructure was created ad-hoc, it’s not reproducible, there’s only one environment shared between dev and prod, etc.

I’ve been in multiple meetings with coworkers and my manager talking about how it is embarassing that this is what we’re shipping. For context, I haven’t been on this project for very long, but multiple projects we’re working on are like this.

Two years ago, this would have been unacceptable. Our team has worked on and shipped products used by millions of people. Today the management is just chasing the hype, and we can barely get one customer to stay with us.

The issue lies with the priorities from the top down. They want new stuff. They don’t care if it works, how maintainable it is, or even what the cost is. All they care about is “AI this” and “look at our velocity” and so on. Nobody cares if they’re shipping something that works, or even shipping the right thing.

heluecht@pirati.ca · 10 hours ago

@TehPers That sounds like a very bad situation!

heluecht@pirati.ca · 1 day ago

@Petter1 @remington at our company every PR needs to be reviewed by at least one lead developer. And the PRs of the lead developers have to be reviewed by architects. And we encourage the other developers to perform reviews as well. Our company encourages the usage of Copilot. But none of our reviewers would pass code that they don’t understand.

Petter1@discuss.tchncs.de · 1 day ago

🥰nice!

heluecht@pirati.ca · edit-2 10 hours ago

@Petter1 I’m a lead developer. And often I hear from my architect when I missed stuff in some PR that I just checked.

I worked in a lot of different software companies over the last 35 years. And this company has by far the highest standards. It’s sometimes really annoying when you maybe coded 8 hours for a use case, just to spend 10-12 additional hours just for the test cases and maybe some 1-2 additional hours because the QA or the PO found something that needs to be changed. But in the end we can be proud of what we coded.