Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
They could be programmed to do some double/triple checking, and return “i dont know” when the checks are negative.
I guess that would compromise the apparence of oracle that their parent companies seem to dissimulately push onto them.
they don’t check. you gotta think in statistics terms.
based on the previously inputed words (tokens actually, but I’ll use words for the sake of simplicity), which is the system prompt + user prompt, the LLM generates a list of the next possible words that makes most sense, then picks one from the top few. How much it goes down the list on lower possible words is based on temperature configuration. Then the next word, and the next, etc, each time looking back.
I haven’t checked on the reasoning models, what that step actually does, but I assume it just expands the user prompt to fill in stuff that thr LLM thinks the user was lazy to input, then works on the final answer.
so basically is like tapping on your phone keyboard next word prediction.
They could be programmed to do some double/triple checking, and return “i dont know” when the checks are negative. I guess that would compromise the apparence of oracle that their parent companies seem to dissimulately push onto them.
they don’t check. you gotta think in statistics terms.
based on the previously inputed words (tokens actually, but I’ll use words for the sake of simplicity), which is the system prompt + user prompt, the LLM generates a list of the next possible words that makes most sense, then picks one from the top few. How much it goes down the list on lower possible words is based on temperature configuration. Then the next word, and the next, etc, each time looking back.
I haven’t checked on the reasoning models, what that step actually does, but I assume it just expands the user prompt to fill in stuff that thr LLM thinks the user was lazy to input, then works on the final answer.
so basically is like tapping on your phone keyboard next word prediction.