On the final day of OpenAI’s 12 days of ‘shipmas,’ the corporate unveiled its newest fashions, o3 and o3-mini, which excel at reasoning and even outperform o1 on a collection of benchmarks, together with math and science. At launch, OpenAI CEO Sam Altman mentioned o3 was slated to drop on the finish of January, and at present, the corporate made good on its promise.
o3-mini
On Friday, OpenAI launched its o3-mini mannequin, probably the most cost-efficient mannequin in OpenAI’s reasoning collection, to the general public. Till now, that collection has been comprised of o1 and o1-mini. Like its predecessor, the mannequin is especially sturdy in science, math, and coding, based on the corporate.
OpenAI o3-mini is now obtainable in ChatGPT and the API.
Professional customers could have limitless entry to o3-mini and Plus & Workforce customers could have triple the speed limits (vs o1-mini).
Free customers can attempt o3-mini in ChatGPT by choosing the Motive button beneath the message composer.— OpenAI (@OpenAI) January 31, 2025
When o3-mini is chosen, it’s going to use medium reasoning effort, which balances velocity and accuracy. Whereas the unique o1 mannequin nonetheless has broader common data than o3-mini, the brand new mannequin’s main benefit is its quicker velocity and better efficiency in comparison with o1-mini.
Benchmark efficiency
When evaluating the efficiency of o3-mini to o1-mini, skilled testers discovered that o3-mini delivered extra correct, reasoned-through, and clearer responses than o1-mini. In line with the submit, they most well-liked o3-mini responses 56% of the time and noticed a 39% discount in main errors.
Past human choice evaluations, in a number of STEM benchmarks, together with the Competitors Math (AIME 2024), PhD-level Science Questions (GPQA Diamond), and Competitors Code (Codeforces), o3-mini with medium reasoning — which is what ChatGPT customers will get by default — outperformed o1-mini.
Also notable is that o3-mini, with excessive reasoning effort within the benchmarks, got here near o1 efficiency, generally even surpassing it, as seen within the AIME 2024 above and Software program Engineering (SWE-bench Verified) benchmarks. The o3-mini mannequin with medium reasoning effort matched o1’s efficiency within the Codeforces benchmark.
Security
OpenAI assessed o3-mini’s security by public launch by jailbreak and disallowed content material evaluations. The corporate discovered that the mannequin considerably surpasses GPT-4o on the evaluations. OpenAI posted the analysis outcomes under and in addition launched an o3-mini System Card, a 37-page PDF that features the detailed outcomes of the evaluations.
The right way to entry
All subscribers to OpenAI’s paid tiers, together with ChatGPT Plus, Workforce, and Professional, can entry OpenAI o3-mini beginning at present. Plus and Workforce customers now have 3 times the speed restrict, going from 50 messages per day with o1-mini to 150 messages per day. ChatGPT Enterprise entry is coming in per week.
The o3-mini mannequin will exchange o1-mini within the mannequin picker, as it will be helpful for a similar duties, besides that have will now be improved with decrease latency and better fee limits. As a paid person, on the time of writing, I didn’t but have entry to the o3-mini, and am as an alternative nonetheless seeing the o1-mini possibility.
If you do not have a subscription, no worries: You possibly can see if o3-mini is definitely worth the hype out of your free account. All free ChatGPT customers need to do is click on on “Motive” within the message textbox or regenerate a response. OpenAI CEO Sam Altman confirmed free entry in a submit on X. Till now, all of the reasoning fashions have been stored behind a paywall; OpenAI didn’t specify any limitations across the new mannequin for Free customers.