Google reveals Gemini 2.5 Flash, its ‘most cost-efficient thinking model’

Simply weeks after unveiling Gemini 2.5 Professional, Google is on to its subsequent top-performing mannequin.

On Thursday, the corporate launched an “early model” of Gemini 2.5 Flash in preview within the Gemini API, AI Studio, and Vertex AI. The mannequin has a data cutoff of January 2025. It may well take textual content, photographs, video, and audio prompts, and has a one-million-token context window.

Google says the brand new model expands on Flash 2.0 with improved reasoning, however “with out compromising its famend velocity or price.” Reasoning fashions spend extra time “pondering” — or deciphering a question — earlier than responding, which leads to extra thorough and direct output that, ideally, aligns higher with a consumer’s wants, in comparison with earlier fashions that prioritize velocity. Fashions that cause are additionally higher geared up to precisely ship on multi-step issues or duties.

“Gemini 2.5 Flash performs strongly on Onerous Prompts in ChatBot Enviornment, second solely to 2.5 Professional,” Google notes within the announcement.

Referring to the brand new mannequin as its most cost-efficient, Google notes that 2.5 Flash “permits builders to configure the quantity of pondering it does to maximise efficiency.” This provides builders a “pondering funds,” or the ability to pay for reasoning solely after they want it most. With reasoning on, the output worth jumps from 60 cents per a million tokens to $3.50.

If builders do not give the mannequin a funds, it determines the question’s pondering wants itself by evaluating the request for complexity. For instance, it is going to determine prompts with minimal reasoning wants — like “What number of states are there within the US?” — individually from multi-step math issues. Google notes that to copy Flash 2.0 latency and value, builders ought to set the funds to 0.

Gemini 2.5 Flash scored 12% on Humanity’s Final Examination (HLE), a brand new, different benchmark to business exams which have develop into too simple for quickly evolving fashions. This rating outperformed competitor fashions, together with Claude 3.7 Sonnet and DeepSeek R1, however not OpenAI’s just-launched o4-mini, which got here in at 14% on the take a look at.

You’ll be able to attempt Gemini 2.5 Flash in preview by means of the Gemini API in Google AI Studio and Vertex AI.

Need extra tales about AI? Join Innovation, our weekly publication.