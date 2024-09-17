OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to “think” before they answer. There’s been a lot of hype building up to these models, codenamed “Strawberry” inside OpenAI. But does Strawberry live up to the hype? Sort of. Compared to GPT-4o, the o1 models feel like one step forward and two steps back. OpenAI o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI’s latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that “GPT-4o is still the best option for most prompts” on its help page, and notes elsewhere that o1 struggles at simpler tasks. “It’s impressive, but I think the improvement is not very significant,” said Ravid Shwartz Ziv, an NYU professor who studies AI models. “It’s better at certain problems, but you don’t have this across-the-board improvement.” For all of these reasons, it’s important to use o1 only for the questions it’s truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today’s AI models are not very good at it. However, o1 is a tentative step in that direction. OpenAI o1 is unique because it “thinks” before answering, breaking down big problems into small steps and attempting to identify when it gets one of those steps right or wrong. This “multi-step reasoning” isn’t entirely new (researchers have proposed it for years, and You.com uses it for complex queries), but it hasn’t been practical until recently.

