Microsoft built a fake marketplace to test AI agents — they failed in surprising ways

Must Read
bicycledays
bicycledayshttp://trendster.net
Please note: Most, if not all, of the articles published at this website were completed by Chat GPT (chat.openai.com) and/or copied and possibly remixed from other websites or Feedzy or WPeMatico or RSS Aggregrator or WP RSS Aggregrator. No copyright infringement is intended. If there are any copyright issues, please contact: bicycledays@yahoo.com.

On Wednesday, researchers at Microsoft launched a brand new simulation atmosphere designed to check AI brokers, together with new analysis exhibiting that present agentic fashions could also be weak to manipulation. Carried out in collaboration with Arizona State College, the analysis raises new questions on how effectively AI brokers will carry out when working unsupervised — and the way shortly AI firms could make good on guarantees of an agentic future.

The simulation atmosphere, dubbed the “Magentic Market” by Microsoft, is constructed as an artificial platform for experimenting on AI agent habits. A typical experiment would possibly contain a customer-agent attempting to order dinner in line with a person’s directions, whereas brokers representing varied eating places compete to win the order.

The staff’s preliminary experiments included 100 separate customer-side brokers interacting with 300 business-side brokers. As a result of the supply code for {the marketplace} is open supply, it must be simple for different teams to undertake the code to run new experiments or reproduce findings.

Ece Kamar, CVP and managing director of Microsoft Analysis’s AI Frontiers Lab, says this sort of analysis might be important to understanding the capabilities of AI brokers. “There may be actually a query about how the world goes to vary by having these brokers collaborating and speaking to one another and negotiating,” mentioned Kamar. “We wish to perceive these items deeply.”

The preliminary analysis checked out a mixture of main fashions, together with GPT-4o, GPT-5, and Gemini-2.5-Flash, and located some shocking weaknesses. Specifically, the researchers discovered a number of methods companies may use to govern buyer brokers into shopping for their merchandise. The researchers observed a selected falloff in effectivity as a buyer agent was given extra choices to select from, overwhelming the eye area of the agent.

“We would like these brokers to assist us with processing plenty of choices,” Kamar says. “And we’re seeing that the present fashions are literally getting actually overwhelmed by having too many choices.”

The brokers additionally bumped into bother after they have been requested to collaborate towards a standard aim, apparently uncertain of which agent ought to play what function within the collaboration. Efficiency improved when the fashions got extra express directions on how you can collaborate, however the researchers nonetheless noticed the fashions’ inherent capabilities as in want of enchancment.

Techcrunch occasion

San Francisco
|
October 13-15, 2026

“We are able to instruct the fashions — like we will inform them, step-by-step,” Kamar mentioned. “But when we’re inherently testing their collaboration capabilities, I’d anticipate these fashions to have these capabilities by default.”

Latest Articles

Who trusts Sam Altman?

In Might 2023, OpenAI CEO Sam Altman was sworn in and testifying earlier than Congress in regards to the...

More Articles Like This