As generative AI fever has swept the tech world, one of the most hyped use cases has been the ability for large models to write computer programmes. French startup Poolside recently raised a $126m seed round to build a ChatGPT-like system trained on code.
But generative AI models often make mistakes, which might be ok if you’re writing marketing copy but is less ok when one tiny error in a line of code can break a whole system.
As hyped-up GenAI startups try to solve these problems and win over their first clients, one Oxford-based company is already automating code for some of the world’s largest financial institutions.
Meet Diffblue
Diffblue — which was spun out of the same Oxford University AI labs that gave birth to DeepMind — has built a code automation tool that’s already being used by JP Morgan, Citi, Goldman Sachs, Amazon Web Services and Cisco.
Like DeepMind’s board-game-master-beating AlphaGo system, Diffblue’s product uses an AI approach known as reinforcement learning.
Unlike generative AI systems like ChatGPT, the model is not trained on vast quantities of data that allow it to guess the solution to a problem. Instead, the model is trained on the logic and rules of the problem it’s given, and then attempts to solve it. It is rewarded when it arrives at a better solution, as it gradually iterates to find the best answer.
In Diffblue’s case, this method is used to test lines of Java code (the most commonly used enterprise coding language) to make sure that computer programmes are working as they should.
CEO Matthew Lodge tells Sifted that, in contrast to GenAI coding companions, Diffblue doesn’t need a “human-in-the-loop” to find the right piece of code, which is important given the scale of the projects they’re working on.
“One of our customers might have a million-line application, so they’d be running 40k-50k tests on that application,” he explains. “If you want that to result in saving real time for a development team, then it has to be unattended — it’s got to be autonomous.”
Winning the enterprise prize
When he joined Diffblue CEO in 2019, Lodge brought with him significant experience of selling software to big corporates in Silicon Valley. He says that this involves not just thinking about how the product works, but how it fits into the business context of highly regulated industries like financial services.
“It is about understanding the operational environment. Our product is self-contained, so they can run it on their own systems. It’s not a cloud service that they have to send their code to, which could be compromised,” he explains.
“We also indemnify everything, so if there’s ever any legal problem with any code that we generate, we will take the case if you get sued. We have insurance for that.”
Lodge says that Diffblue will likely raise a Series B round in 2024, and will be focussing on beefing up its commercial and customer service teams in the US, where 90% of its business comes from.
What’s the market like?
As well as Poolside, Diffblue is competing with companies like Silicon Valley-based CircleCI, as well as DeepMind which itself uses reinforcement learning to generate code, But Lodge says that the biggest competitor to its product is still humans.
Diffblue recently added a new coding language to its product — Kotlin — and Lodge says it will slowly increase the number of capabilities. But it will not try to scale too quickly.
“One of the things for a small company is you’ve got to be really, really good at one thing to get started.”
He also doesn’t seem too worried by the threat of GenAI code generation just yet, as the industry hype from ChatGPT has yet to be fulfilled.
“The promise of these large models has been way ahead of the delivery,” says Lodge.
“[GenAI for coding] is essentially better autocomplete, and software development environments have had autocomplete for decades. It’s a bit like saying, “We’re going to type faster,’ but speed of typing is not really what determines speed of software development.”