The world's most powerful AI programmer: GPT-4o empowers you to run from requirement to completion in just 84 seconds

2024.08.14

Big models are rapidly moving towards replacing human programmers.

In March this year, AI software engineer Devin set the AI ​​community on fire with his product, which is powered by OpenAI's GPT-4-based large language model (LLM) and can autonomously write and edit code after receiving natural language text instructions.

But in the field of generative AI, rapid development is the main theme, and now the technology has been iterated again.

This week, a Y Combinator-backed startup called Cosine announced its own new autonomous AI engineer, Genie, which the company says easily outperforms Devin, scoring 30% on the third-party benchmark SWE-Bench, compared to Devin’s 13.8%.

The new tool even surpassed Amazon’s Q and Factory’s Code Droid by 19% and is now the world’s best performing AI programmer.

picture

Genie's performance on the SWE-Bench benchmark and comparison with other AI code models.

“This model is much more than a benchmark: it’s trained from scratch to think and act like a human SWE (software engineer),” said Alistair Pullen, co-founder and CEO of Cosine.

picture

A Genie who can fix bugs and write code

As an advanced AI software engineering model, Genie can autonomously handle various coding tasks according to the instructions of human engineers, including bug fixing, feature building, code refactoring, code testing, etc.

Genie can operate completely autonomously or collaborate with users to complete tasks.

It supports a variety of programming languages, which include JavaScript, Python, TypeScript, TSX, Java, C#, C++, C, Rust, Scala, Kotlin, Swift, Golang, PHP, and Ruby, according to the technical report.

Cosine claims that Genie can simulate the cognitive process of human engineers. "It observes how human engineers work and imitates the process," said Alistair Pullen.

Security issues have always been a concern for everyone. The code generated by Genie is stored in the user's GitHub repository, so Cosine will not retain a copy of the code, thus avoiding the resulting security risks.

In addition, Cosine’s software platform has integrated Slack and system notifications, which acts like an AI colleague, reminding users of status or marking issues.

Alistair Pullen demonstrated how to use Genie to solve real-world problems. The target is an issue on GitHub, and we just need to throw a link directly into it. AI will automatically analyze the problem and automatically start thinking about which files are needed to solve the problem until the requirements are met.

picture

Genie then begins trying to break the problem down into its many solution steps and then generates code.

picture

The next step is to run the code. If there is a problem with the generated code, it will automatically find the problem, analyze and modify it, and then try to run it again.

picture

Final output: two files, 17 tests, and only 84 seconds.

picture

This is countless times faster than human programmers.

Long context powered by OpenAI models

Unlike many AI models that rely on a base model supplemented by a few tools, Genie was developed through a proprietary process.

As for the model, Genie is built on a (currently) non-generalizable variant of GPT-4o, which OpenAI allowed Cosine to train as part of its experimental access program.

According to a technical report, when the researchers started building Genie, they were only able to fine-tune models with relatively short context windows in the 16-32k range.

To address this problem, the team conducted a lot of early exploration of these models and trained them on large datasets of more than 100 million tokens. Although they found that the architecture had certain advantages, they were still limited in the amount of information that the model could process in a specific time.

After trying various compression/chunking methods, the team decided that the only solution was to use a larger context model, although none was available at the time.

Fortunately, not long after, an OpenAI model appeared that was able to ensure training with long contexts.

Cosine said in its blog post that it took them nearly a year to compile the data set. In the most recent training run, Genie was trained on billions of tokens of data, and the data selected included the programming languages ​​that users are most concerned about at present. The following is the proportion of data from different programming languages ​​in the process of training Genie:

picture

The following is the data share of different functions such as bug fixes and refactoring:

picture

In terms of price, Pullen revealed that Genie will initially be priced in two tiers:

  • The entry-level option is priced at around $20. This tier has some functional and usage restrictions and is suitable for individuals and small teams;
  • The enterprise option offers extended functionality and virtually unlimited use, like having an AI colleague who is proficient in code. But this tier will be priced higher.

The launch of Genie has far-reaching implications for software development teams, especially those looking to increase productivity and reduce the time spent on routine tasks. With its ability to autonomously handle complex programming challenges, Genie may change the way engineering resources are allocated, allowing teams to focus on more strategic initiatives.

 Pullen said that for him, engineering resources are no longer a limitation, which is a huge driving force, especially since starting the company. He believes that the value of an AI colleague who can quickly enter an unknown code base and solve unseen problems is obvious and has a huge impact on the world. 

In the future, the company intends to expand its model portfolio to include small models for simple tasks and larger models that can handle more complex challenges. In addition, Cosine plans to expand its work to the open source community. 

Genie is now available to some users, but wider access is not yet fully available.

Application address: https://cosine.sh/register

Founding team: only five people

Cosine, the startup that came up with Genie, was founded in 2022 by Pullen, Sam Stenner, and Yang Li with a mission to push the boundaries of AI by applying human reasoning to solve complex problems. Obviously, their efforts started with software engineering.


picture

Among them, Yang Li is a Chinese who graduated with a master's degree from Oxford University and was selected for the Forbes 30 Under 30 Europe list in 2021.

Cosine has raised $2.5 million in seed funding from Uphonest and SOMA Capital, with participation from Lakestar, Focal and others.

Although the team is small, Cosine has made significant progress in the field of AI, and Genie is just the beginning.

“We believe we can build human-level reasoning capabilities for any job and industry,” Pullen said in the announcement post. “Software engineering is just the most intuitive starting point, and we’ll soon show you everything else we’re working on.”