Shortly after Hunter Lightman joined OpenAI as a researcher in 2022, he watched his colleagues launch ChatGPT, one of many fastest-growing merchandise ever. In the meantime, Lightman quietly labored on a crew instructing OpenAIβs fashions to resolve highschool math competitions.Β
At the moment that crew, often called MathGen, is taken into account instrumental to OpenAIβs industry-leading effort to create AI reasoning fashions: the core know-how behind AI brokers that may do duties on a pc like a human would.
βWe had been making an attempt to make the fashions higher at mathematical reasoning, which on the time they werenβt excellent at,β Lightman advised Trendster, describing MathGenβs early work.
OpenAIβs fashions are removed from excellent right this moment β the corporateβs newest AI methods nonetheless hallucinate and its brokers battle with complicated duties.
However its state-of-the-art fashions have improved considerably on mathematical reasoning. One in every of OpenAIβs fashions not too long ago gained a gold medal on the Worldwide Math Olympiad, a math competitors for the worldβs brightest highschool college students. OpenAI believes these reasoning capabilities will translate to different topics, and in the end energy general-purpose brokers that the corporate has at all times dreamed of constructing.
ChatGPT was a contented accident β a lowkey analysis preview turned viral client enterprise β however OpenAIβs brokers are the product of a years-long, deliberate effort throughout the firm.Β
βUltimately, youβll simply ask the pc for what you want and itβll do all of those duties for you,β stated OpenAI CEO Sam Altman on the firmβs first developer convention in 2023. βThese capabilities are sometimes talked about within the AI area as brokers. The upsides of this are going to be great.β
Techcrunch occasion
San Francisco
|
October 27-29, 2025
Whether or not brokers will meet Altmanβs imaginative and prescient stays to be seen, however OpenAI shocked the world with the discharge of its first AI reasoning mannequin, o1, within the fall of 2024. Lower than a 12 months later, the 21 foundational researchers behind that breakthrough are essentially the most extremely sought-after expertise in Silicon Valley.
Mark Zuckerberg recruited 5 of the o1 researchers to work on Metaβs new superintelligence-focused unit, providing some compensation packages north of $100 million. One in every of them, Shengjia Zhao, was not too long ago named chief scientist of Meta Superintelligence Labs.
The reinforcement studying renaissance
The rise of OpenAIβs reasoning fashions and brokers are tied to a machine studying coaching approach often called reinforcement studying (RL). RL offers suggestions to an AI mannequin on whether or not its selections had been right or not in simulated environments.
RL has been used for many years. As an illustration, in 2016, a few 12 months after OpenAI was based in 2015, an AI system created by Google DeepMind utilizing RL, AlphaGo, gained international consideration after beating a world champion within the board sport, Go.
Round that point, one among OpenAIβs first staff, Andrej Karpathy, started pondering how you can leverage RL to create an AI agent that might use a pc. However it will take years for OpenAI to develop the required fashions and coaching strategies.
By 2018, OpenAI pioneered its first giant language mannequin within the GPT sequence, pretrained on huge quantities of web information and a big clusters of GPUs. GPT fashions excelled at textual content processing, ultimately resulting in ChatGPT, however struggled with fundamental math.Β
It took till 2023 for OpenAI to realize a breakthrough, initially dubbed βQ*β after which βStrawberry,β by combining LLMs, RL, and a method referred to as test-time computation. The latter gave the fashions further time and computing energy to plan and work by way of issues, verifying its steps, earlier than offering a solution.
This allowed OpenAI to introduce a brand new method referred to as βchain-of-thoughtβ (CoT), which improved AIβs efficiency on math questions the fashions hadnβt seen earlier than.
βI might see the mannequin beginning to motive,β stated El Kishky. βIt will discover errors and backtrack, it will get annoyed. It actually felt like studying the ideas of an individual.βΒ
Although individually these strategies werenβt novel, OpenAI uniquely mixed them to create Strawberry, which immediately led to the event of o1. OpenAI shortly recognized that the planning and reality checking skills of AI reasoning fashions might be helpful to energy AI brokers.
βWe had solved an issue that I had been banging my head towards for a few years,β stated Lightman. βIt was one of the thrilling moments of my analysis profession.β
Scaling reasoning
With AI reasoning fashions, OpenAI decided it had two new axes that will permit it to enhance AI fashions: utilizing extra computational energy through the post-training of AI fashions, and giving AI fashions extra time and processing energy whereas answering a query.
βOpenAI, as an organization, thinks quite a bit about not simply the best way issues are, however the best way issues are going to scale,β stated Lightman.
Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an βBrokersβ crew led by OpenAI researcher Daniel Selsam to make additional progress on this new paradigm, two sources advised Trendster. Though the crew was referred to as βBrokers,βΒ OpenAI didnβt initially differentiate between reasoning fashions and brokers as we consider them right this moment. The corporate simply wished to make AI methods able to finishing complicated duties.
Ultimately, the work of Selsamβs Brokers crew grew to become half of a bigger undertaking to develop the o1 reasoning mannequin, with leaders together with OpenAI co-founder Ilya Sutskever, chief analysis officer Mark Chen, and chief scientist Jakub Pachocki.
OpenAI must divert treasured assets β primarily expertise and GPUs β to create o1. All through OpenAIβs historical past, researchers have needed to negotiate with firm leaders to acquire assets; demonstrating breakthroughs was a surefire option to safe them.
βOne of many core elements of OpenAI is that every thing in analysis is backside up,β stated Lightman. βAfter we confirmed the proof [for o1], the corporate was like, βThis is sensible, letβs push on it.ββ
Some former staff say that the startupβs mission to develop AGI was the important thing consider reaching breakthroughs round AI reasoning fashions. By specializing in creating the smartest-possible AI fashions, relatively than merchandise, OpenAI was capable of prioritize o1 above different efforts.Β That sort of enormous funding in concepts wasnβt at all times potential at competing AI labs.
The choice to strive new coaching strategies proved prescient. By late 2024, a number of main AI labs began seeing diminishing returns on fashions created by way of conventional pretraining scaling. At the moment, a lot of the AI areaβs momentum comes from advances in reasoning fashions.
What does it imply for an AI to βmotive?β
In some ways, the objective of AI analysis is to recreate human intelligence with computer systems. Because the launch of o1, ChatGPTβs UX has been crammed with extra human-sounding options comparable to βconsideringβ and βreasoning.β
When requested whether or not OpenAIβs fashions had been actually reasoning, El Kishky hedged, saying he thinks concerning the idea when it comes to pc science.
βWeβre instructing the mannequin how you can effectively expend compute to get a solution. So in case you outline it that means, sure, it’s reasoning,β stated El Kishky.
Lightman takes the method of specializing in the mannequinβs outcomes and never as a lot on the means or their relation to human brains.
βIf the mannequin is doing laborious issues, then it’s doing no matter obligatory approximation of reasoning it wants with a view to do this,β stated Lightman. βWe will name it reasoning, as a result of it seems to be like these reasoning traces, however itβs all only a proxy for making an attempt to make AI instruments which are actually highly effective and helpful to lots of people.β
OpenAIβs researchers be aware folks might disagree with their nomenclature or definitions of reasoning β and certainly, critics have emerged β however they argue itβs much less necessary than the capabilities of their fashions. Different AI researchers are likely to agree.
Nathan Lambert, an AI researcher with the non-profit AI2, compares AI reasoning modes to airplanes in a weblog put up. Each, he says, are artifical methods impressed by nature β human reasoning and hen flight, respectively β however they function by way of solely totally different mechanisms. That doesnβt make them any much less helpful, or any much less able to reaching related outcomes.
A bunch of AI researchers from OpenAI, Anthropic, and Google DeepMind agreed in a latest place paper that AI reasoning fashions will not be effectively understood right this moment, and extra analysis is required. It could be too early to confidently declare what precisely is happening inside them.
The following frontier: AI brokers for subjective duties
The AI brokers in the marketplace right this moment work finest for well-defined, verifiable domains comparable to coding. OpenAIβs Codex agent goals to assist software program engineers offload easy coding duties. In the meantime, Anthropicβs fashions have develop into significantly common in AI coding instruments like Cursor and Claude Code β these are a few of the first AI brokers that individuals are prepared to pay up for.
Nonetheless, normal objective AI brokers like OpenAIβs ChatGPT Agent and Perplexityβs Comet battle with lots of the complicated, subjective duties folks wish to automate. When making an attempt to make use of these instruments for on-line procuring or discovering a long-term parking spot, Iβve discovered the brokers take longer than Iβd like and make foolish errors.
Brokers are, in fact, early methods that may undoubtedly enhance. However researchers should first work out how you can higher practice the underlying fashions to finish duties which are extra subjective.
βLike many issues in machine studying, itβs a knowledge downside,β stated Lightman, when requested concerning the limitations of brokers on subjective duties. βA number of the analysis Iβm actually enthusiastic about proper now is determining how you can practice on much less verifiable duties. Now we have some leads on how you can do these items.βΒ
Noam Brown, an OpenAI researcher who helped create the IMO mannequin and o1, advised Trendster that OpenAI has new general-purpose RL strategies which permit them to show AI fashions expertise that arenβt simply verified. This was how the corporate constructed the mannequin which achieved a gold medal at IMO, he stated.
OpenAIβs IMO mannequin was a more recent AI system that spawns a number of brokers, which then concurrently discover a number of concepts, after which select the absolute best reply. A majority of these AI fashions are rising in popularity; Google and xAI have not too long ago launched state-of-the-art fashions utilizing this system.
βI believe these fashions will develop into extra succesful at math, and I believe theyβll get extra succesful in different reasoning areas as effectively,β stated Brown. βThe progress has been extremely quick. I donβt see any motive to suppose it would decelerate.β
These strategies might assist OpenAIβs fashions develop into extra performant, beneficial properties that might present up within the firmβs upcoming GPT-5 mannequin. OpenAI hopes to say its dominance over opponents with the launch of GPT-5, ideally providing the most effective AI mannequin to energy brokers for builders and shoppers.
However the firm additionally needs to make its merchandise easier to make use of. El Kishky says OpenAI needs to develop AI brokers that intuitively perceive what customers need, with out requiring them to pick out particular settings. He says OpenAI goals to construct AI methods that perceive when to name up sure instruments, and the way lengthy to motive for.
These concepts paint an image of an final model of ChatGPT: an agent that may do something on the web for you, and perceive the way you need it to be finished. Thatβs a a lot totally different product than what ChatGPT is right this moment, however the firmβs analysis is squarely headed on this path.
Whereas OpenAI undoubtedly led the AI {industry} a couple of years in the past, the corporate now faces a tranche of worthy opponents. The query is not simply whether or not OpenAI can ship its agentic future, however can the corporate achieve this earlier than Google, Anthropic, xAI, or Meta beat them to it?





