AI hallucinations occur when models like OpenAI’s ChatGPT or Google’s Bard fabricate information entirely, behaving as if they are spouting facts. One example: In Google’s own February promotional video for Bard, the chatbot makes an untrue claim about the James Webb Space Telescope. More recently, ChatGPT cited “bogus” cases in a New York federal court filing, and the New York attorneys involved may face sanctions.
“Even state-of-the-art models are prone to producing falsehoods —they exhibit a tendency to invent facts in moments of uncertainty,” the OpenAI researchers wrote in the report. “These hallucinations are particularly problematic in domains that require multi-step reasoning, since a single logical error is enough to derail a much larger solution.”
OpenAI’s potential new strategy for fighting the fabrications: Train AI models to reward themselves for each individual, correct step of reasoning when they’re arriving at an answer, instead of just rewarding a correct final conclusion. The approach is called “process supervision,” as opposed to “outcome supervision,” and could lead to better explainable AI, according to the researchers, since the strategy encourages models to follow more of a human-like chain of “thought” approach.