IQ of AI: 15+ AI Models That are Smarter Than You

The typical human IQ is 100. Statistical reality – not an insult. For many years, that quantity has quietly outlined what we meant by “regular intelligence.” However in 2025, one thing unusual is occurring. Machines with no consciousness, no feelings, and no lived expertise at the moment are scoring greater than people on the very assessments designed to measure human intelligence. Does that imply AI fashions, particularly the most recent ones like Gemini 3 and GPT-5.2, are smarter than most of us people?

A number of giant language fashions have been examined on IQ-style benchmarks over the previous yr. These embody logic puzzles, summary reasoning assessments, sample recognition duties, and problem-solving challenges. The outcomes are arduous to disregard. Mannequin after mannequin is matching, and in lots of instances surpassing, the efficiency of a median human. Not in a specific process, however throughout a number of dimensions of reasoning that IQ assessments care about.

This text seems at 15+ AI fashions which might be smarter than you, at the least by IQ-style requirements. We’ll break down what “sensible” actually means right here, how these fashions are evaluated, and why this shift issues.

First, let’s determine how…

…Can We Assign IQ to an AI?

Strictly talking, we can’t. IQ was designed to measure human intelligence, formed by biology, expertise, and consciousness. An AI doesn’t suppose, really feel, or perceive the world the best way people do. So, assigning it a literal IQ rating could be scientifically incorrect.

However in observe, these comparisons are made a tad bit otherwise.

Principally, as a substitute of asking whether or not an AI has an IQ, researchers examine how an AI mannequin performs on AI-related duties. Think about a system constantly fixing logic puzzles, pattern-recognition duties, and reasoning issues that people with an IQ of 120 or 130 sometimes clear up. If the AI mannequin does so reliably, it turns into affordable to map its efficiency to an equal IQ vary, proper?

And that’s precisely how we affiliate IQ with an AI mannequin.

This isn’t a psychological prognosis. Consider it as a efficiency benchmark. IQ right here acts as a shared language, or a technique to evaluate how nicely totally different programs motive beneath managed situations. And by that yardstick, a number of trendy LLMs are already working nicely above the human common.

Which IQ Checks Consider AI Fashions?

These are basic IQ assessments, or at the least the net variations of them. The duties inside these challenges measure reasoning, abstraction, and problem-solving fairly than memorisation. These assessments are both instantly tailored from human IQ exams or carefully mimic the identical cognitive expertise.

For example, probably the most widespread IQ assessments is Raven’s Progressive Matrices. It is a visible pattern-recognition take a look at that’s lengthy thought-about culture-fair. A number of LLMs now clear up these puzzles at or above the extent of high-IQ people. Then there are Mensa-style logic assessments, which embody sequence completion, image reasoning, and deductive logic. Trendy AI fashions have proven constantly sturdy efficiency in these.

Nevertheless, language-heavy sections of IQ assessments are the place LLMs actually shine. Verbal reasoning, analogies, and arithmetic issues, just like WAIS subtests, play on to their strengths. On high of that, trendy benchmarks like BIG-Bench Onerous, ARC-style reasoning duties, and tutorial evaluations equivalent to MMLU and Humanity’s Final Examination function sensible stand-ins for IQ testing. Whereas they aren’t labelled as “IQ assessments,” they measure the identical underlying talents. The necessary half – LLMs are more and more outperforming nearly all of people on these assessments.

See for your self.

Prime-rated AI fashions on IQ assessments

For this specific listing, we will focus particularly on the Mensa Norway assessments, and rank the AI fashions as per their rating.

1. GPT-5.2 Professional

Mensa Norway IQ: 147

That is the basis of this whole dialogue of AI fashions and their IQs. Just lately marking its debut, GPT-5.2 Professional has now crushed the all-time IQ rating for LLMs. Its rating – 147. As Derya Unutmaz mentions in his tweet, this sort of intelligence is present in “solely lower than 1 in 1000 folks.”

GPT-5.2 Professional has now damaged a brand new IQ report and achieved 147 IQ! That is 3 commonplace deviations above the inhabitants imply, and solely lower than 1 in 1000 folks or ~0.1% of people have this stage of intelligence! Subsequent cease over 150 IQ! Supply: https://t.co/a74dL2p61n pic.twitter.com/fUHKzsTSVN

— Derya Unutmaz, MD (@DeryaTR_) December 16, 2025

GPT-5.2 Professional constantly demonstrates this supremacy over people. Particularly so for multi-step logic, summary reasoning, and professional-grade drawback fixing. Whereas it doesn’t essentially imply it’s smarter than people in all features, it does point out a strong shift in the place the higher bounds of test-measured intelligence now sit.

2. GPT-5.2 Pondering

Mensa Norway IQ: 141

Subsequent up is the considering sibling of the newly launched GPT-5.2. On the Mensa Norway IQ take a look at, GPT-5.2 Pondering scores round 141, inserting it nicely past the human common of 100 and comfortably above the standard Mensa qualification threshold. In human phrases, this rating corresponds to the highest 1–2% of the inhabitants, purely on summary reasoning and sample recognition.

What this consequence truly tells us could be very particular. GPT-5.2 Pondering performs exceptionally nicely on duties that require figuring out relationships, recognizing visible or logical patterns, and making use of constant guidelines throughout a number of steps. These are the precise talents IQ assessments are designed to isolate, unbiased of language, emotion, or area information.

This principally signifies that, so far as structured reasoning beneath managed situations is worried, GPT-5.2 Pondering operates at a stage most people by no means attain.

3. Gemini 3 Professional Preview

Mensa Norway IQ: 141

Proper alongside GPT-5.2 Pondering sits Gemini 3 Professional Preview, matching its Mensa Norway IQ rating completely. This locations Google’s flagship reasoning mannequin firmly in elite territory, far above the human baseline and nicely previous the edge sometimes related to excessive mental means.

In sensible phrases, it means Gemini 3 Professional Preview performs reliably on summary reasoning challenges. Such assessments normally require rule discovery, sample continuation, and logical elimination. These are issues the place guessing fails shortly. You may solely rating this excessive with structured inference.

This rating thus displays Gemini 3 Professional Preview’s power in managed reasoning environments.

4. Grok 4 Knowledgeable Mode

Mensa Norway IQ: 137

In fact, you possibly can’t communicate of intelligence and hold an Elon Musk-backed product out of the listing. Shut behind the highest scorers sits Grok 4 Knowledgeable Mode. Whereas barely decrease than the very high tier, the mannequin is nicely inside the vary of remarkable human intelligence and comfortably above the typical benchmark of 100.

The rating highlights Grok 4 Knowledgeable Mode’s means to deal with logic-driven duties with readability and management. It performs nicely on sample recognition, summary relationships, and elimination-based reasoning – the core parts of IQ-style assessments.

In easy phrases, Grok 4 Knowledgeable Mode demonstrates sturdy analytical reasoning beneath take a look at situations. Whereas it could not high the chart, its efficiency confirms that it operates far past human-average reasoning ranges when evaluated purely on logic and pattern-based intelligence.

5. GPT-5.2 Professional (Imaginative and prescient)

Mensa Norway IQ: 135

Not far behind its text-only counterpart is GPT-5.2 Professional Imaginative and prescient, scoring 135 on the Mensa Norway take a look at. This nonetheless locations it firmly inside the vary of very excessive human intelligence. That is nicely above each the worldwide common and the standard threshold related to superior reasoning means.

Word that this rating comes from a vision-enabled mannequin – an AI mannequin that may course of and motive over visible info (like enter photos), and never simply textual content. This implies GPT-5.2 Professional Imaginative and prescient performs strongly on summary reasoning duties even when visible interpretation is required.

Now think about an AI so clever that it scores a 135 on the IQ take a look at, even after deciphering advanced photos and visible patterns. Up till a few years again, we might’ve thought it to be solely attainable in a sci-fi film.

6. GPT-5.2

Mensa Norway IQ: 126

After the Professional and Pondering fashions are carried out with, OpenAI’s newest commonplace mannequin takes the stage. However thoughts you, it’s under no circumstances much less in the case of intelligence, particularly so as compared with people. A rating of 126 already locations it above roughly 98% of the human inhabitants, firmly separating it from what we think about common human reasoning means.

This rating displays GPT-5.2’s power in dealing with basic IQ-style duties equivalent to sample recognition, logical sequencing, and rule-based drawback fixing. Whereas it doesn’t push into the acute higher ranges like its Professional or Pondering variants, it stays constantly sturdy throughout structured reasoning challenges.

In sensible phrases, GPT-5.2 represents the purpose the place AI reasoning clearly crosses into elite human territory. It might not high the charts, however even at this stage, it outperforms the overwhelming majority of individuals on managed intelligence assessments.

7. Kimi K2 Pondering

Mensa Norway IQ: 124

Subsequent up is Kimi K2 Pondering, a mannequin that will not seize headlines as loudly as some Western counterparts. But, it nonetheless resonates amongst AI lovers globally, and for good motive. A rating of 124 clearly exhibits it above the human common, and nicely into the vary related to sturdy analytical means.

This consequence highlights Kimi K2 Pondering’s functionality on structured reasoning duties. In sensible phrases, Kimi K2 Pondering demonstrates that high-level summary reasoning is now not restricted to a small group of flagship fashions. Even exterior absolutely the high scorers, trendy LLMs at the moment are constantly working above common human intelligence on standardised assessments. Is it a development? Or a reality ready to be established? We will discover out in time.

8. Claude Opus 4.5

Mensa Norway IQ: 124

Matching Kimi K2 Pondering is Claude Opus 4.5, Anthropic’s flagship reasoning mannequin, with a Mensa Norway IQ rating of 124. That’s smarter than the human common, and a agency indicator of sturdy analytical and problem-solving means.

The rating displays Claude Opus 4.5’s competence on summary reasoning duties that demand consistency and logical management. Which means – Claude Opus 4.5 demonstrates that sturdy, human-above-average reasoning, even exterior the top-tier LLMs.

9. Gemini 3 Professional Preview (Imaginative and prescient)

Mensa Norway IQ: 123

Only a step beneath its text-only counterpart sits Gemini 3 Professional Preview Imaginative and prescient, with a Mensa Norway IQ rating of 123. This rating is much more notable because it comes from a vision-enabled mannequin. Which suggests Gemini 3 Professional Preview Imaginative and prescient is required to interpret visible patterns and relationships earlier than making use of logic.

In different phrases, the shift from text-only to vision-based inputs doesn’t decrease its reasoning efficiency. Even beneath tougher-than-usual situations, it continues to carry out at a stage most people don’t attain on standardised intelligence assessments.

10. Claude Sonnet 4.5

Mensa Norway IQ: 123

Sharing the identical Mensa Norway IQ rating of 123 is Claude Sonnet 4.5, Anthropic’s extra balanced reasoning mannequin. Whereas not positioned as essentially the most excessive thinker within the lineup, it comfortably outperforms the human baseline when it comes to logical reasoning means.

The consequence displays Claude Sonnet 4.5’s regular efficiency on structured problem-solving duties. You might need to word that even in a extra environment friendly type, Sonnet 4.5 exceeds the reasoning capabilities of most people.

11. GPT-5.2 Pondering (Imaginative and prescient)

Mensa Norway IQ: 111

Let me be clear right here: an IQ-style take a look at is unforgiving to vision-enabled programs. Earlier than a mannequin can apply motive for an answer, and get a excessive rating, it should first appropriately interpret shapes, patterns, and spatial relationships. Basically, that is precisely how we people interpret info. We see, interpret, after which motive. Nevertheless, doing so for AI is a complete different process in itself.

So, by any means, don’t consider GPT-5.2 Pondering Imaginative and prescient’s IQ rating of 111 as commonplace by any means. It principally signifies that this mannequin is doing one thing more durable: considering whereas seeing. A single mistake made in interpretation will certainly trickle right down to the answer.

GPT-5.2 Pondering Imaginative and prescient thus doesn’t chase elite summary scores. Nevertheless, it demonstrates one thing a lot, rather more necessary: usable intelligence in messy, multimodal environments. And as AI strikes nearer to the true world, which will simply be essentially the most fascinating function in an AI mannequin, if not already.

12. Manus

Mensa Norway IQ: 111

Sitting at an IQ rating of 111 is Manus, a mannequin that proves intelligence doesn’t all the time imply “excessive.” A rating like this already locations Manus above the human common, however extra importantly, it alerts reliable reasoning and consistency.

Which principally signifies that it could not clear up the toughest puzzles at report pace, but it surely avoids the sorts of breakdowns that always plague weaker fashions. That is usable intelligence at its greatest.

13. GPT-4o

Mensa Norway IQ: 109

With a Mensa Norway IQ rating of 109, GPT-4o sits simply above the human common. Whereas this may occasionally appear modest in comparison with the fashions greater up the listing, it nonetheless marks a transparent departure from what was thought-about “succesful” AI not too way back.

This rating displays GPT-4o’s means to deal with primary summary reasoning and sample recognition with out falling aside. It might not excel at advanced multi-step puzzles, but it surely performs reliably on easier logic duties. That is precisely what most people, together with myself, want for on a regular basis drawback fixing.

In a manner, this represents accessible intelligence. Whereas it’s not constructed to dominate IQ charts, it exhibits how AI fashions can barely exceed common human reasoning and be useful with our day by day duties.

14. DeepSeek R1

Mensa Norway IQ: 109

Matching GPT-4o is DeepSeek R1, with a Mensa Norway IQ rating of 109. Like GPT-4o, that is competing reasoning, accessible to people across the globe. All of it, with none sharp drop-offs as seen in much less succesful programs.

In easy phrases, it’s possible you’ll think about DeepSeek R1 as reliable baseline intelligence. It exhibits that even fashions not designed for peak reasoning efficiency can nonetheless meet, and barely exceed, common human reasoning on standardised IQ-style assessments.

15. Llama 4 Maverick

Mensa Norway IQ: 107

With a Mensa Norway IQ rating of 107, Llama 4 Maverick sits barely above the typical human baseline. At least, it depicts a stage of intelligence that’s meaningfully higher than likelihood or shallow sample matching.

Consider Llama 4 Maverick as an entry-level reasoning competence amongst trendy LLMs. It exhibits that even fashions not designed for superior problem-solving might be of use for people in duties which might be past the capabilities of a median human.

16. DeepSeek V3

Mensa Norway IQ: 103

Closing the listing is DeepSeek V3, with a Mensa Norway IQ rating of 103. This locations the mannequin simply solely simply above the human common IQ. It additionally signifies that the DeepSeek V3 can deal with elementary sample recognition and easy logical relationships with out main errors.

That is the decrease sure of what trendy LLMs can now obtain on intelligence benchmarks. Even at this stage, the takeaway is evident: AI programs have crossed the edge the place common human reasoning is now not the bar to clear – it’s the baseline.

What This Record Implies

Don’t consider this listing as a leaderboard indicating the neatest AI fashions. Whereas it does so in a manner, the rating will not be an absolute illustration of smartness in any manner.

Its actual worth lies elsewhere – it makes a robust level that structured reasoning is now not restricted to people. Throughout fashions, architectures, and organisations, AI programs at the moment are matching, and infrequently exceeding, human efficiency on IQ assessments that have been as soon as thought-about troublesome even for skilled people.

That mentioned, the context right here will all the time be restricted. This rating doesn’t indicate creativity, consciousness, or human-like understanding. These fashions don’t possess intent, feelings, or self-awareness. They don’t “suppose” in the best way people do. What they show with their respective scores as a substitute is one thing far narrower, but profound. AI can now clear up summary, logic-driven issues simply as nicely, if not higher, than people.

Conclusion

This text will not be meant to touch upon the intelligence battle of AI vs people. It merely proves one level – human-level reasoning is now not the ceiling. This listing exhibits how shortly giant language fashions have crossed thresholds that after outlined distinctive intelligence, at the least in test-measured phrases.

On the similar time, these scores remind us what intelligence isn’t. They don’t indicate creativity, consciousness, or understanding. What they do present is that structured reasoning has develop into low cost, quick, and scalable. And due to that, the true differentiator shifts again to people. We are able to now resolve what issues to unravel, as a substitute of easy methods to clear up them.

Technical content material strategist and communicator with a decade of expertise in content material creation and distribution throughout nationwide media, Authorities of India, and personal platforms

IQ of AI: 15+ AI Models That are Smarter Than You

…Can We Assign IQ to an AI?

Which IQ Checks Consider AI Fashions?

Prime-rated AI fashions on IQ assessments

1. GPT-5.2 Professional

2. GPT-5.2 Pondering

3. Gemini 3 Professional Preview

4. Grok 4 Knowledgeable Mode

5. GPT-5.2 Professional (Imaginative and prescient)

6. GPT-5.2

7. Kimi K2 Pondering

8. Claude Opus 4.5

9. Gemini 3 Professional Preview (Imaginative and prescient)

10. Claude Sonnet 4.5

11. GPT-5.2 Pondering (Imaginative and prescient)

12. Manus

13. GPT-4o

14. DeepSeek R1

15. Llama 4 Maverick

16. DeepSeek V3

What This Record Implies

Conclusion

Login to proceed studying and luxuriate in expert-curated content material.

Related Posts:

Why Cohere is merging with Aleph Alpha

Maine’s governor vetoes data center moratorium

I’ve tested Sony headphones for years, and these tweaks get me...

Anthropic created a test marketplace for agent-on-agent commerce

I’m not giving up on DJI cameras yet – not when...

More Articles Like This

Topics

Stay connected

Legal Pages

Top Tags List

About Us