Written By Shubham Arora
Published By: Shubham Arora | Published: Dec 05, 2025, 12:49 PM (IST)
Google has announced a new reasoning-focused mode called Gemini 3 Deep Think, now available to users subscribed to the Gemini Ultra tier. The feature, which began rolling out on Thursday, aims to handle more complex and multi-step tasks than previous versions of Google’s AI models. It is positioned as the company’s most capable reasoning system yet. Also Read: Google Health replaces Fitbit after 14 years with AI-powered wellness features
Deep Think is built to handle questions that require step-by-step reasoning in areas like maths, logic, and science. Instead of sticking to one possible answer, it looks at several different approaches and then chooses the one that makes the most sense for the query. This approach is meant to help the system handle problems that require structured reasoning rather than simple text generation. Also Read: Why Sam Altman, Jeff Bezos and other tech leaders are changing their tone on AI replacing jobs
According to Google, internal testing shows Deep Think performing significantly better than earlier reasoning models. On the “Humanity’s Last Exam” benchmark, which measures advanced reasoning without external tools, it reportedly achieved a score of 41%. When used with code execution, the model reached 45.1% on the ARC-AGI-2 benchmark. The company says these results make it the strongest reasoning system in the Gemini lineup so far. Also Read: OpenAI hires humans to study a future where AI automates their jobs
The mode builds on work done for Gemini 2.5 Deep Think, which gained attention earlier this year for matching human results in select international competitions. With Gemini 3, Google says it has improved accuracy, context handling, and overall stability.
Gemini 3 Deep Think is already live for users on the Ultra subscription. The mode can be turned on within the Gemini app by tapping the Deep Think option and selecting Gemini 3 Pro as the active model. It works across phones, tablets, and desktops, giving users the ability to test its reasoning output on a range of tasks – from basic queries to more technical prompts.