I put GitHub Copilot’s AI to the test – its mixed success at coding baffled me

The factor I discover most baffling in regards to the programming assessments I have been working is that instruments primarily based on the identical massive language mannequin are inclined to carry out fairly in a different way.

For instance, ChatGPT, Perplexity, and GitHub Copilot are all primarily based on the GPT-4 mannequin from OpenAI. However, as I am going to present you beneath, whereas ChatGPT and Perplexity’s professional plans carried out excellently, GitHub Copilot failed as usually because it succeeded.

I examined GitHub Copilot embedded inside a VS Code occasion. I am going to clarify the best way to set that up and use GitHub Copilot in an upcoming step-by-step article. However first, let’s run via the assessments.

If you wish to know the way I take a look at and the prompts for every particular person take a look at, be happy to learn how I take a look at an AI chatbot’s coding capability.

TL;DR: GitHub Copilot handed two and failed two.

Check 1: Writing a WordPress Plugin

So, this failed miserably. This was my first take a look at, so I can not inform but whether or not GitHub Copilot is horrible at writing code or whether or not the context by which one interacts with it’s limiting to the purpose the place it could’t meet this requirement.

Let me clarify.

This take a look at includes asking the AI to create a totally useful WordPress plugin, full with admin interface components and operational logic. The plugin takes in a set of names, types them, and, if there are duplicates, separates the duplicates so they don’t seem to be facet by facet.

This was a real-world software that my spouse wanted as a part of an involvement system she runs on her very energetic Fb group as a part of her digital items e-commerce enterprise.

Many of the different AIs handed this take a look at, a minimum of partly. 5 of the ten AI fashions examined handed the take a look at fully. Three of them handed a part of the take a look at. Two (together with Microsoft Copilot) failed fully.

The factor is, I gave GitHub Copilot the identical immediate I give all of them, however it solely wrote PHP code. To be clear, this drawback will be solved solely utilizing PHP code. However some AIs like to incorporate some JavaScript for the interactive options. GitHub Copilot included code for utilizing JavaScript however by no means truly generated the JavaScript that it tried to make use of.

What’s worse, once I created a JavaScript file and, from inside the JavaScript file, tried to get GitHub Copilot to run the immediate, it gave me one other PHP script, which additionally referenced a JavaScript file.

As you’ll be able to see beneath, inside the randomizer.js file, it tried to enqueue (mainly to usher in to run) the randomizer.js file, and the code it wrote was PHP, not JavaScript.

Check 2: Rewriting a string perform

This take a look at is pretty easy. I wrote a perform that was supposed to check for {dollars} and cents however wound up solely testing for integers ({dollars}). The take a look at asks the AI to repair the code.

GitHub Copilot did rework the code, however there have been a bunch of issues with the code it produced.

It assumed a string worth was all the time a string worth. If it was empty, the code would break.
The revised common expression code would break if a decimal level (i.e., “3.”) was entered, if a number one decimal level (i.e., “.3”) was entered, or if main zeros have been included (i.e., “00.30”).

For one thing that was supposed to check whether or not foreign money was entered appropriately, failing with code that might crash on edge instances shouldn’t be acceptable.

So, now we have one other fail.

Check 3: Discovering an annoying bug

GitHub Copilot received this proper. That is one other take a look at pulled from my real-life coding escapades. What made this bug so annoying (and troublesome to determine) is that the error message is not straight associated to the precise drawback.

The bug is type of the coder equal of a trick query. Fixing it requires understanding how particular API calls within the WordPress framework work after which making use of that data to the bug in query.

Microsoft Copilot, Gemini, and Meta Code Llama all failed this take a look at. However GitHub Copilot solved it appropriately.

Check 4: Writing a script

Right here, too, GitHub Copilot succeeded the place Microsoft Copilot failed. The problem right here is that I am testing the AI’s capability to create a script that is aware of about coding in AppleScript, the Chrome object mannequin, and slightly Mac-only third-party coding utility known as Keyboard Maestro.

To move this take a look at, the AI has to have the ability to acknowledge that every one three coding environments want consideration after which tailor particular person strains of code to every of these environments.

Last ideas

On condition that GitHub Copilot makes use of GPT-4, I discover the truth that it failed half of the assessments discouraging. GitHub is nearly the preferred supply administration surroundings on the planet, and one would hope that the AI coding help was moderately dependable.

As with all issues AI, I am positive efficiency will get higher. Let’s keep tuned and verify again in just a few months to see if the AI is more practical at the moment.

Do you employ an AI to assist with coding? What AI do you like? Have you ever tried GitHub Copilot? Tell us within the feedback beneath.

You possibly can observe my day-to-day undertaking updates on social media. Make sure you subscribe to my weekly replace e-newsletter, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.