Start your day with intelligence. Get The OODA Daily Pulse.

DeepMind’s SCoRe shows LLMs can use their internal knowledge to correct their mistakes

While large language models (LLMs) are becoming increasingly effective at complicated tasks, there are many cases where they can’t get the correct answer on the first try. This is why there is growing interest in enabling LLMs to spot and correct their mistakes, also known as “self-correction.” However, current attempts at self-correction are limited and have requirements that often cannot be met in real-world situations. In a new paper, researchers at Google DeepMind introduce Self-Correction via Reinforcement Learning (SCoRe), a novel technique that significantly improves the self-correction capabilities of LLMs using only self-generated data. SCoRe can be a valuable tool for making LLMs more robust and reliable and opens new possibilities for enhancing their reasoning and problem-solving abilities. “Self-correction is a capability that greatly enhances human thinking,” Aviral Kumar, research scientist at Google DeepMind, told VentureBeat. “Humans often spend more time thinking, trying out multiple ideas, correcting their mistakes, to finally then solve a given challenging question, as opposed to simply in one-shot producing solutions for challenging questions. We would want LLMs to be able to do the same.” Ideally, an LLM with strong self-correction capabilities should be able to review and refine its own answers until it reaches the correct response. This is especially important because LLMs often possess the knowledge needed to solve a problem internally but fail to use it effectively when generating their initial response.

Full story : DeepMind’s SCoRe shows LLMs can use their internal knowledge to correct their mistakes.