The best AI for coding in 2024 (and what not to use)

I have been round expertise for lengthy sufficient that little or no excites me, and even much less surprises me. However shortly after Open AI’s ChatGPT was launched, I requested it to write a WordPress plugin for my wife’s e-commerce site. When it did, and the plugin labored, I used to be certainly stunned.

That was the start of my deep exploration into chatbots and AI-assisted programming. Since then, I’ve subjected 10 giant machine fashions (LLMs) to 4 real-world exams.

Find out how to use ChatGPT to write down: Resumes | Excel formulas | Essays | Cover letters 

Sadly, not all chatbots can code alike. It has been 18 months since that first take a look at, and even now, 5 of the ten LLMs I examined cannot create working plugins. 

On this article, I am going to present you the way every LLM carried out in opposition to my exams. There are two chatbots I like to recommend you utilize, however they price $20/month. The free variations of the identical chatbots do properly sufficient that you might in all probability get by with out paying. However the remaining, whether or not free or paid, are usually not so nice. I will not threat my programming initiatives with them or suggest that you just do till their efficiency improves.

Additionally: How I test an AI chatbot’s coding ability – and you can too

I’ve written loads about using AIs to help with programming. Except it is a small, easy challenge, like my spouse’s plugin, AIs cannot write entire apps or applications. However they excel at writing just a few traces and are usually not unhealthy at fixing code. 

Relatively than repeat every thing I’ve written, go forward and browse this text: How to use ChatGPT to write code: What it can and can’t do for you.

If you wish to perceive my coding exams, why I’ve chosen them, and why they’re related to this assessment of the ten LLMs, learn this text: How I test an AI chatbot’s coding ability – and you can too.  

Let’s begin with a comparative take a look at how the chatbots carried out:

David Gewirtz/ZDNET

Subsequent, let us take a look at every chatbot individually. I am going to focus on 9 chatbots, although the above chart exhibits 10 LLMs. The outcomes for GPT-4 and GPT-4o are each included in ChatGPT Plus. Prepared? Let’s go.

Execs
  • Handed all exams
  • Strong coding outcomes
  • Mac app
Cons
  • Hallucinations
  • No Home windows app but
  • Typically uncooperative
  • Worth: $20/mo
  • LLM: GPT-4o, GPT-4, GPT-3.5
  • Desktop browser interface: Sure
  • Devoted Mac app: Sure
  • Devoted Home windows app: No
  • Multi-factor authentication: Sure
  • Exams handed: 4 of 4

ChatGPT Plus with GPT-4 and GPT-4o handed all my exams. One in all my favourite options is the provision of a devoted app. Once I take a look at net programming, I’ve my browser set on one factor, my IDE open, and the ChatGPT Mac app working on a separate display screen.

Additionally: I put GPT-4o through my coding tests and it aced them – except for one weird result

As well as, Logitech’s Prompt Builder, which pops up utilizing a mouse button, may be arrange to make use of the upgraded GPT-4o and hook up with your OpenAI account, making it a easy thumb-tap to run a immediate, which could be very handy.

The one factor I did not like was that certainly one of my GPT-4o exams resulted in a dual-choice reply, and a type of solutions was flawed. I might slightly it simply gave me the right reply. Even so, a fast take a look at confirmed which reply would work. However that was a bit annoying. I did not have that situation in GPT-4, so for now, that is the LLM setting I exploit with ChatGPT when coding.

Execs
  • A number of LLMs
  • Search standards displayed
  • Good sourcing
Cons
  • E-mail-only login
  • No desktop app
  • Worth: $20/mo
  • LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Massive, Claude 3 Opus, Llama 3.1 405B
  • Desktop browser interface: Sure
  • Devoted Mac app: No
  • Devoted Home windows app: No
  • Multi-factor authentication: No
  • Exams handed: 4 of 4

I significantly thought of itemizing Perplexity Pro as the most effective general AI chatbot for coding, however one failing saved it out of the highest slot: the way you log in. Perplexity would not use username/password or passkey, and would not have multi-factor authentication. All it does is e-mail you a login pin. The AI additionally would not have a separate desktop app, as ChatGPT does for Macs.

What units Perplexity other than different instruments is that it will probably run a number of LLMs. When you cannot set an LLM for a given session, you possibly can simply go into the settings and select the energetic mannequin.

Additionally: Can Perplexity Pro help you code? It aced my programming tests – thanks to GPT-4

For programming, you will in all probability need to follow GPT-4o, as a result of that aced all our exams. Nevertheless it may be attention-grabbing to cross-check code throughout the completely different LLMs. For instance, you probably have GPT-4o write some common expression code, you may contemplate switching to a special LLM to see what that LLM thinks of the generated code.

As we’ll see beneath, most LLMs are unreliable, so do not take the outcomes as gospel. Nonetheless, you should utilize the outcomes to present you extra issues to verify your unique code. It is form of like an AI-driven code assessment.

Simply do not forget to change again to GPT-4o.

Cons
  • Immediate throttling
  • Might lower you off in the course of no matter you are engaged on
  • Worth: Free
  • LLM: GPT-4o, GPT-3.5
  • Desktop browser interface: Sure
  • Devoted Mac app: Sure
  • Devoted Home windows app: No
  • Multi-factor authentication: Sure
  • Exams handed: 3 of 4 in GPT-3.5 mode

ChatGPT is on the market to anybody free of charge. Whereas each the Plus and free variations assist GPT-4o, which handed all my programming exams, there are limitations when utilizing the free app.

OpenAI treats free ChatGPT customers as in the event that they’re within the low cost seats. If site visitors is excessive or the servers are busy, the free ChatGPT will solely make GPT-3.5 out there to free customers. The instrument will solely enable you a sure variety of queries earlier than it downgrades or shuts you off.

Additionally: How to use ChatGPT: What you need to know now

I’ve had a number of events when the free model of ChatGPT successfully advised me I might requested too many questions.

ChatGPT is a good instrument, so long as you do not thoughts getting shut down typically. Even GPT-3.5 did higher on the exams than all the opposite chatbots, and the take a look at it failed was for a reasonably obscure programming instrument produced by a lone programmer in Australia.

So, if funds is necessary to you and you’ll wait when lower off, go for ChatGPT free.

Execs
  • Free
  • Handed most exams
  • Vary of analysis instruments
Cons
  • Restricted to GPT-3.5
  • Throttles immediate outcomes
  • Worth: Free
  • LLM: GPT-3.5
  • Desktop browser interface: Sure
  • Devoted Mac app: No
  • Devoted Home windows app: No
  • Multi-factor authentication: No
  • Exams handed: 3 of 4

I am threading a reasonably tremendous needle right here, however as a result of Perplexity AI’s free model relies on GPT-3.5, the take a look at outcomes have been measurably higher than the opposite AI chatbots.

Additionally: 5 reasons why I prefer Perplexity over every other AI chatbot

From a programming perspective, that is just about the entire story. However from a analysis and group perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity over the opposite AIs.

He likes how Perplexity supplies extra full sources for analysis questions, the way it cites its sources, the way it organizes the replies, and the way it supplies inquiries to additional searches.

So when you’re programming, but additionally doing different analysis, contemplate the free model of Perplexity.

Chatbots to keep away from for programming assist

I examined 9 chatbots, and 4 handed most of my exams. The opposite chatbots, together with just a few pitched as nice for programming, every solely handed certainly one of my exams — and Microsoft’s Copilot didn’t pass any.

I am mentioning them right here as a result of individuals will ask, and I did take a look at them completely. A few of them just do tremendous for different work, so I am going to level you to their extra normal critiques when you’re simply interested by how they operate.

Meta AI

ai-comparison-006
David Gewirtz/ZDNET

Meta AI is Fb’s general-purpose AI. As you possibly can see above, it failed three of our 4 exams. 

Additionally: How to get started with Meta AI in Facebook, Instagram, and more

The AI did generate a pleasant consumer interface however with zero performance. And it did discover my annoying bug, which is a reasonably critical problem. Given the precise data required to seek out the bug, I used to be stunned it choked on a easy common expression problem. Nevertheless it did.

Meta Code Llama

ai-comparison-007
David Gewirtz/ZDNET

Meta Code Llama is Fb’s AI designed particularly for coding assist. It is one thing you possibly can obtain and set up in your server. I examined it working on a Hugging Face AI occasion.

Additionally: Can Meta AI code? I tested it against Llama, Gemini, and ChatGPT – it wasn’t even close

Weirdly, although each Meta AI and Meta Code Llama choked on three of 4 of my exams, they choked on completely different issues. AIs cannot be counted on to present the identical reply twice, however this consequence was a shock. We’ll see if that adjustments over time.

Claude 3.5 Sonnet

ai-comparison-008
David Gewirtz/ZDNET

Anthropic claims the three.5 Sonnet model of its Claude AI chatbot is right for programming. After failing all however one take a look at, I am not so positive.

Should you’re not utilizing it for programming, Claude could also be a more sensible choice than the free model of ChatGPT. 

Additionally: 4 things Claude AI can do that ChatGPT can’t

My ZDNET colleague Maria Diaz reports that Claude can deal with uploaded information, course of extra phrases than the free model of ChatGPT, present data roughly a 12 months extra present than GPT-3.5, and entry web sites.

Gemini Superior

ai-comparison-009
David Gewirtz/ZDNET

Gemini Advanced is Google’s $20 professional model of its Gemini (previously Bard) chatbot. I anticipated the instrument to do higher than one out of 4. Curiously, it handed the one take a look at that each AI apart from GPT-4/4o failed — data of that pretty obscure programming language produced by one programmer in Australia.

Additionally: 3 ways Gemini Advanced beats other AI assistants, according to Google

So, if it knew that language, why could not it deal with primary common expressions or different first-year programming scholar issues?

Microsoft Copilot

ai-comparison-010
David Gewirtz/ZDNET

You’d assume the corporate with the “Builders! Builders! Builders!” mantra in its DNA would have an AI that does higher on the programming exams. Microsoft produces a few of the finest coding instruments on the planet. And but, Copilot did badly.

Additionally: What are Microsoft’s different Copilots? Here are the differences and how you can use them

The one optimistic factor is that Microsoft at all times learns from its errors. So, I am going to verify again later and see if this consequence improves.  

It is solely a matter of time

The outcomes of my exams have been pretty shocking, particularly given the large investments of Microsoft and Google. However this space of innovation is improving at warp speed, so we’ll be again with up to date exams and outcomes over time. Keep tuned.

Have you ever used any of those AI chatbots for programming? What has your expertise been? Tell us within the feedback beneath.


You’ll be able to comply with my day-to-day challenge updates on social media. You’ll want to subscribe to my weekly update newsletter, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Sensi Tech Hub
Logo