I put DeepSeek AI’s coding skills to the test – here’s where it fell apart

DeepSeek exploded into the world’s consciousness this previous weekend. It stands out for 3 highly effective causes:

It is an AI chatbot from China, slightly than the US
It is open supply
It makes use of vastly much less infrastructure than the large AI instruments we have been taking a look at

Given the US authorities’s considerations over TikTok and attainable Chinese language authorities involvement in that code, a brand new AI blasting on the scene from China is sure to generate consideration. ZDNET’s Radhika Rajkumar did a deep dive into these points in her article, Why China’s DeepSeek may burst our AI bubble.

On this article, we’re avoiding politics. As a substitute, I am placing DeepSeek via the identical set of AI coding exams I’ve thrown at ten different massive language fashions.

The quick reply is that this: spectacular, however not excellent. Let’s dig in.

Check 1: Writing a WordPress plugin

This check was truly my first check of ChatGPT’s programming prowess, method again within the day. My spouse wanted a plugin for WordPress that might assist her run an involvement machine for her on-line group.

Her wants have been pretty easy. It wanted to absorb a listing of names, one identify per line. It then needed to kind the names, and if there have been duplicate names, separate them so that they weren’t listed side-by-side.

I did not actually have time to code it for her, so I made a decision to present the AI the problem on a whim. To my enormous shock, it labored.

Since then, it has been my first check for AIs when evaluating their programming expertise. It requires the AI to know easy methods to arrange code for the WordPress framework and comply with prompts clearly sufficient to create each the person interface and program logic.

Solely about half of the AIs I’ve examined can absolutely move this check. Now, nevertheless, we are able to add another to the winner’s circle.

DeepSeek created each the person interface and program logic precisely as specified. Thus far, DeepSeek has handed certainly one of 4 exams.

Check 2: Rewriting a string perform

A person complained that he was unable to enter {dollars} and cents right into a donation entry area. As written, my code solely allowed {dollars}. So, the check entails giving the AI the routine that I wrote and asking it to rewrite it to permit for each {dollars} and cents.

Normally, this ends in the AI producing some common expression validation code. DeepSeek did generate code that works, though there’s room for enchancment. The code that DeepSeek wrote was unnecessarily lengthy and repetitious. My largest concern is that the DeepSeek validation ensures validation as much as 2 decimal locations, but when a really massive quantity is entered (like 0.30000000000000004), the usage of parseFloat does not have express rounding data.

I might give this to DeepSeek as a result of neither of those points would trigger this system to interrupt when run by a person and would generate the anticipated outcomes.

And that offers DeepSeek two wins out of 4.

Check 3: Discovering an annoying bug

It is a check created after I had a really annoying bug that I had issue monitoring down. As soon as once more, I made a decision to see if ChatGPT may deal with it, which it did.

The problem is that the reply is not apparent. Really, the problem is that there’s an apparent reply, based mostly on the error message. However the apparent reply is the flawed reply. This not solely caught me, however commonly catches a number of the AIs.

Fixing this bug requires understanding how particular API calls inside WordPress work, having the ability to see past the error message to the code itself, after which figuring out the place to search out the bug.

DeepSeek handed this one as properly, bringing us to a few out of 4 wins. That already places DeepSeek forward of Gemini, Copilot, Claude, and Meta.

Will DeepSeek rating a house run? Let’s discover out.

Check 4: Writing a script

And one other one bites the mud. It is a difficult check as a result of it requires the AI to know the interaction between three environments: AppleScript, the Chrome object mannequin, and a Mac scripting device known as Keyboard Maestro.

I’d have known as this an unfair check, as a result of Keyboard Maestro just isn’t a mainstream programming device. However ChatGPT dealt with the check simply, understanding precisely what a part of the issue is dealt with by every device.

Sadly, DeepSeek didn’t have this degree of data. It did not know that it wanted to separate the duty between directions to Keyboard Maestro and Chrome. It additionally had pretty weak data of AppleScript, writing customized routines for AppleScript which can be native to the language.

This leaves DeepSeek with three right exams and one fail.

Closing ideas

I discovered that DeepSeek’s insistence on utilizing a public cloud e mail deal with like gmail.com (slightly than my regular e mail deal with with my company area) was annoying. It additionally had various responsiveness fails that made doing these exams take longer than I’d have preferred.

I wasn’t positive I might be capable to write this text as a result of for many of the day, I obtained this error when attempting to enroll:

DeepSeek’s on-line providers have just lately confronted large-scale malicious assaults. To make sure continued service, registration is briefly restricted to +86 cellphone numbers. Current customers can log in as typical. Thanks on your understanding and help.

Then, I obtained in and was capable of run the exams.

DeepSeek appears to be overly loquacious by way of the code it generates. The AppleScript code in Check 4 was each flawed and excessively lengthy. The common expression code in Check 2 was right, nevertheless it may have been written in a method that made it way more maintainable.

I am undoubtedly impressed that DeepSeek beat out Gemini, Copilot, and Meta. However, it seems to be on the previous GPT-3.5 degree, which suggests there’s undoubtedly room for enchancment.

For a model new device working on a lot decrease infrastructure than the opposite instruments, this may very well be an AI to observe.

What do you suppose? Have you ever tried DeepSeek? Are you utilizing any AIs for programming help? Tell us within the feedback beneath.

You’ll be able to comply with my day-to-day undertaking updates on social media. Remember to subscribe to my weekly replace e-newsletter, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.