AI agents wrong ~70% of time: Carnegie Mellon study

Jaden Norman@lemmy.world · 26 days ago

AI agents wrong ~70% of time: Carnegie Mellon study

jsomae@lemmy.ml · 25 days ago

The problem is they are not i.i.d., so this doesn’t really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we’re already looking at “agents,” so they’re probably already doing chain-of-thought.

Knock_Knock_Lemmy_In@lemmy.world · 25 days ago

Very fair comment. In my experience even increasing the temperature you get stuck in local minimums

I was just trying to illustrate how 70% failure rates can still be useful.