The best AI for coding in 2025 (and what not to use – including DeepSeek R1)

I have been round know-how for lengthy sufficient that little or no excites me, and even much less surprises me. However shortly after Open AI’s ChatGPT was launched, I requested it to put in writing a WordPress plugin for my spouse’s e-commerce website. When it did, and the plugin labored, I used to be certainly stunned.

That was the start of my deep exploration into chatbots and AI-assisted programming. Since then, I’ve subjected 14 giant machine fashions (LLMs) to 4 real-world exams.

Sadly, not all chatbots can code alike. It has been nearly two years since that first take a look at, and even now, 5 of the 14 LLMs I examined cannot create working plugins.

On this article, I will present you the way every LLM carried out towards my exams. There are two chatbots I like to recommend you utilize, however they price $20/month. The free variations of the identical chatbots do properly sufficient that you could possibly most likely get by with out paying. However the remaining, whether or not free or paid, aren’t so nice. I will not threat my programming tasks with them or advocate that you just do till their efficiency improves.

I’ve written loads about utilizing AIs to assist with programming. Until it is a small, easy undertaking, like my spouse’s plugin, AIs cannot write total apps or packages. However they excel at writing just a few traces and aren’t unhealthy at fixing code.

Slightly than repeat all the pieces I’ve written, go forward and browse this text: The best way to use ChatGPT to put in writing code: What it will possibly and may’t do for you.

If you wish to perceive my coding exams, why I’ve chosen them, and why they’re related to this assessment of the 14 LLMs, learn this text: How I take a look at an AI chatbot’s coding capability – and you’ll too.

Let’s begin with a comparative have a look at how the chatbots carried out:

Subsequent, let’s take a look at every chatbot individually. I will focus on 13 chatbots, though the above chart exhibits 14 LLMs. The outcomes for GPT-4 and GPT-4o are each included in ChatGPT Plus. Prepared? Let’s go.

Execs

Handed all exams
Stable coding outcomes
Mac app

Cons

Hallucinations
No Home windows app but
Typically uncooperative

Worth: $20/mo
LLM: GPT-4o, GPT-4, GPT-3.5
Desktop browser interface: Sure
Devoted Mac app: Sure
Devoted Home windows app: No
Multi-factor authentication: Sure
Checks handed: 4 of 4

ChatGPT Plus with GPT-4 and GPT-4o handed all my exams. One in all my favourite options is the supply of a devoted app. After I take a look at internet programming, I’ve my browser set on one factor, my IDE open, and the ChatGPT Mac app operating on a separate display.

As well as, Logitech’s Immediate Builder, which pops up utilizing a mouse button, could be arrange to make use of the upgraded GPT-4o and connect with your OpenAI account, making it a easy thumb-tap to run a immediate, which could be very handy.

The one factor I did not like was that certainly one of my GPT-4o exams resulted in a dual-choice reply, and a type of solutions was improper. I would relatively it simply gave me the right reply. Even so, a fast take a look at confirmed which reply would work. However that situation was a bit annoying. I did not have that situation in GPT-4, so for now, that is the LLM setting I take advantage of with ChatGPT when coding.

Execs

A number of LLMs
Search standards displayed
Good sourcing

Cons

Electronic mail-only login
No desktop app

Worth: $20/mo
LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Giant, Claude 3 Opus, Llama 3.1 405B
Desktop browser interface: Sure
Devoted Mac app: No
Devoted Home windows app: No
Multi-factor authentication: No
Checks handed: 4 of 4

I critically thought of itemizing Perplexity Professional as the perfect total AI chatbot for coding, however one failing stored it out of the highest slot: the way you log in. Perplexity would not use username/password or passkey, and would not have multi-factor authentication. All of the device does is electronic mail you a login pin. The AI additionally would not have a separate desktop app, as ChatGPT does for Macs.

What units Perplexity other than different instruments is that it will possibly run a number of LLMs. When you cannot set an LLM for a given session, you may simply go into the settings and select the energetic mannequin.

For programming, you may most likely wish to follow GPT-4o, as a result of that aced all our exams. However it may be attention-grabbing to cross-check code throughout the completely different LLMs. For instance, when you have GPT-4o write some common expression code, you may contemplate switching to a distinct LLM to see what that LLM thinks of the generated code.

As we’ll see beneath, most LLMs are unreliable, so do not take the outcomes as gospel. Nonetheless, you should use the outcomes to offer you extra issues to verify your authentic code. It is kind of like an AI-driven code assessment.

Simply remember to change again to GPT-4o.

Execs

Totally different LLM than ChatGPT
Good descriptions
Free entry

Cons

Solely out there in browser mode
Free entry seemingly solely momentary

Worth: Free (for now)
LLM: Grok-1
Desktop browser interface: Sure
Devoted Mac app: No
Devoted Home windows app: No
Multi-factor authentication: Sure
Checks handed: 3 of 4

I’ve to say, Grok stunned me. I assume I did not have excessive hopes for an LLM that appeared tacked onto the Social Community Previously Often called Twitter. However then once more, X is now owned by Elon Musk and two of Musk’s firms, Tesla and SpaceX, have towering AI capabilities.

It is not clear how a lot of the Tesla and SpaceX AI DNA went into Grok, however we are able to pretty assume that there’ll seemingly be extra work. As it’s now, Grok is the one LLM not primarily based on OpenAI LLMs that made it into the really helpful listing.

Grok did make one mistake, however it was a comparatively minor one which could possibly be simply remedied by a barely extra complete immediate. Sure, it failed the take a look at. However by passing the others, and by even doing an nearly excellent good job on the one it handed, it earned itself a spot as a contender.

Keep tuned. That is one to observe.

Cons

Immediate throttling
May lower you off in the course of no matter you are engaged on

Worth: Free
LLM: GPT-4o, GPT-3.5
Desktop browser interface: Sure
Devoted Mac app: Sure
Devoted Home windows app: No
Multi-factor authentication: Sure
Checks handed: 3 of 4 in GPT-3.5 mode

ChatGPT is accessible to anybody free of charge. Whereas each the Plus and free variations assist GPT-4o, which handed all my programming exams, there are limitations when utilizing the free app.

OpenAI treats free ChatGPT customers as in the event that they’re within the low cost seats. If visitors is excessive or the servers are busy, the free ChatGPT will solely make GPT-3.5 out there to free customers. The device will solely enable you a sure variety of queries earlier than it downgrades or shuts you off.

I’ve had a number of events when the free model of ChatGPT successfully informed me I would requested too many questions.

ChatGPT is a good device, so long as you do not thoughts getting shut down typically. Even GPT-3.5 did higher on the exams than all the opposite chatbots, and the take a look at it failed was for a reasonably obscure programming device produced by a lone programmer in Australia.

So, if finances is essential to you and you’ll wait when lower off, go for ChatGPT free.

Execs

Free
Handed most exams
Vary of analysis instruments

Cons

Restricted to GPT-3.5
Throttles immediate outcomes

Worth: Free
LLM: GPT-3.5
Desktop browser interface: Sure
Devoted Mac app: No
Devoted Home windows app: No
Multi-factor authentication: No
Checks handed: 3 of 4

I am threading a fairly high-quality needle right here, however as a result of Perplexity AI’s free model relies on GPT-3.5, the take a look at outcomes have been measurably higher than the opposite AI chatbots.

From a programming perspective, that is just about the entire story. However from a analysis and group perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity over the opposite AIs.

He likes how Perplexity supplies extra full sources for analysis questions, cites its sources, organizes the replies, and gives questions for additional searches.

So when you’re programming, but in addition doing different analysis, contemplate the free model of Perplexity.

Execs

Free
Open Supply
Environment friendly useful resource utilization

Cons

Weak normal data
Small ecosystem
Restricted integrations

Worth: Free for chatbot, charges for API
LLM: DeepSeek MoE
Desktop browser interface: Sure
Devoted Mac app: No
Devoted Home windows app: No
Multi-factor authentication: No
Checks handed: 3 of 4

Whereas DeepSeek R1 is the brand new reasoning hotness from China that has all of the pundits punditing, the true energy proper now (not less than in response to our exams) is DeepSeek V3. This chatbot handed nearly all of our coding exams, doing in addition to the (now principally discontinued) ChatGPT 3.5.

The place DeekSeek V3 fell down was in its data of considerably extra obscure programming environments. Nonetheless, it beat out Google’s Gemini, Microsoft’s Copilot, and Meta’s Meta AI, which is sort of the accomplishment all by itself. We’ll be maintaining an in depth watch on every DeepSeek mannequin, so keep tuned.

Chatbots to keep away from for programming assist

I examined 14 LLMs, and 7 handed most of my exams. The opposite chatbots, together with just a few pitched as nice for programming, every solely handed certainly one of my exams — and Microsoft’s Copilot did not go any.

I am mentioning them right here as a result of folks will ask, and I did take a look at them totally. Some bots just do high-quality for different work, so I will level you to their normal critiques when you’re simply inquisitive about how they operate.

DeepSeek R1

In contrast to DeepSeek V3, the superior reasoning model DeepSeek R1 didn’t showcase its reasoning capabilities when it got here to our programming exams. It was odd that the brand new failure space was one which’s not all that arduous, even for a primary AI — the common expression code for our string operate take a look at.

However that is why we’re operating these real-world exams. It is by no means clear the place an AI will hallucinate or simply plain fail, and earlier than you go believing all of the hype about DeepSeek R1 taking the crown away from ChatGPT, run some programming exams. Thus far, whereas I am impressed with the a lot decreased useful resource utilization and the open supply nature of the product, its coding high quality output is inconsistent.

GitHub Copilot

GitHub’s Copilot integrates fairly seamlessly with VS Code. It makes asking for coding assist very fast and productive, particularly when working in context. That is why it is so disappointing that the code it writes can typically be so very improper.

I can not, in good conscience, advocate you utilize the GitHub Copilot extensions for VS Code. I am involved that the temptation will likely be too nice to only insert blocks of code with out adequate testing — and that GitHub Copilot’s produced code is simply not prepared for manufacturing use. Attempt once more subsequent 12 months.

Meta AI

Meta AI is Fb’s general-purpose AI. As you may see above, it failed three of our 4 exams.

The AI did generate a pleasant person interface however with zero performance. And it did discover my annoying bug, which is a reasonably severe problem. Given the particular data required to search out the bug, I used to be stunned it choked on a easy common expression problem. However it did.

Meta Code Llama

Meta Code Llama is Fb’s AI designed particularly for coding assist. It is one thing you may obtain and set up in your server. I examined it operating on a Hugging Face AI occasion.

Weirdly, though each Meta AI and Meta Code Llama choked on three of 4 of my exams, they choked on completely different issues. AIs cannot be counted on to offer the identical reply twice, however this end result was a shock. We’ll see if that adjustments over time.

Claude 3.5 Sonnet

Anthropic claims the three.5 Sonnet model of its Claude AI chatbot is right for programming. After failing all however one take a look at, I am not so certain.

When you’re not utilizing it for programming, Claude could also be a better option than the free model of ChatGPT.

My ZDNET colleague Maria Diaz experiences that Claude can deal with uploaded information, course of extra phrases than the free model of ChatGPT, present info roughly a 12 months extra present than GPT-3.5, and entry web sites.

Gemini Superior

Gemini Superior is Google’s $20 professional model of its Gemini (previously Bard) chatbot. I anticipated the device to do higher than one out of 4. Apparently, it handed the one take a look at that each AI apart from GPT-4/4o failed — data of that pretty obscure programming language produced by one programmer in Australia.

So, if it knew that language, why could not it deal with primary common expressions or different first-year programming pupil issues?

Microsoft Copilot

You’d suppose the corporate with the “Builders! Builders! Builders!” mantra in its DNA would have an AI that does higher on the programming exams. Microsoft produces among the finest coding instruments on the planet. And but, Copilot did badly.

The one constructive factor is that Microsoft all the time learns from its errors. So, I will verify again later and see if this end result improves.

However I like [insert name here]. Does this imply I’ve to make use of a distinct chatbot?

In all probability not. I’ve restricted my exams to day-to-day programming duties. Not one of the bots has been requested to speak like a pirate, write prose, or draw an image. In the identical manner we use completely different productiveness instruments to perform particular duties, be happy to decide on the AI that helps you full the duty at hand.

The one situation is when you’re on a finances and are paying for a professional model. Then, discover the AI that does most of what you need, so you do not have to pay for too many AI add-ons.

It is solely a matter of time

The outcomes of my exams have been pretty shocking, particularly given the massive investments of Microsoft and Google. However this space of innovation is bettering at warp pace, so we’ll be again with up to date exams and outcomes over time. Keep tuned.

Have you ever used any of those AI chatbots for programming? What has your expertise been? Tell us within the feedback beneath.

You may comply with my day-to-day undertaking updates on social media. Be sure you subscribe to my weekly replace e-newsletter, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

The best AI for coding in 2025 (and what not to use – including DeepSeek R1)

Chatbots to keep away from for programming assist

DeepSeek R1

GitHub Copilot

Meta AI

Meta Code Llama

Claude 3.5 Sonnet

Gemini Superior

Microsoft Copilot

However I like [insert name here]. Does this imply I’ve to make use of a distinct chatbot?

It is solely a matter of time

Related Posts:

ByteDance reportedly pauses global launch of its Seedance 2.0 video generator

The best live TV streaming services of 2026: Expert tested

Wiz investor unpacks Google’s $32B acquisition

The best external hard drives of 2026: Expert tested

Spotify will let you edit your Taste Profile to control your...

More Articles Like This

Topics

Stay connected

Legal Pages

Top Tags List

About Us