Musk launches Grok-1.5, close to GPT-4 level performance

2024.04.01

Musk noted that Grok-1.5 will power xAI’s ChatGPT challenger chatbot on the X platform, while Grok-2 (the new model’s successor) is still in the training phase. He said the next version should be able to "outperform current AI on all metrics," but did not share specific details on when it might become available.

What does Grok-1.5 bring?

xAI announced Grok-1 last November, saying the AI was modeled on "The Hitchhiker's Guide to the Galaxy" and could answer almost any question to help humanity in its quest for understanding and knowledge - regardless of background or political views. In benchmarks such as GSM8K, HumanEval and MMLU, data shared by xAI shows that Grok-1 outperforms Llama-2-70B and GPT-3.5.

xAI noted in a blog post: "In our tests, Grok-1.5 achieved a score of 50.6% on the MATH benchmark and 90% on the GSM8K benchmark, two mathematical benchmarks covering everything from A wide range of competition problems from elementary to high school. Additionally, it scored 74.1% on the HumanEval benchmark that evaluates code generation and problem-solving abilities."

In addition, xAI confirmed that Grok-1.5 has a context window of up to 128,000 tokens (a token is an entire part or subpart of a word, image, video, audio, or code). This enables the model to handle and process large amounts of information at once, 16 times more than Grok-1, making it more suitable for analyzing, summarizing and extracting information from long documents. It can even handle longer, more complex prompts while still maintaining the ability to follow instructions.

Close to OpenAI and Anthropic

With enhanced reasoning and problem-solving capabilities, Grok-1.5 not only outperforms its predecessor in benchmarks, but also comes close to popular open and closed-source models, including Gemini 1.5 Pro, GPT-4, and Claude 3.

For example, on MMLU, Grok-1.5’s score of 81.3% surpasses the recently launched Mistral Large, but lags behind Gemini 1.5 Pro (83.7%), GPT-4 (86.4%,

As of March 2023) and Claude 3 Opus (86.8%). A similar gap was noted on the GSM8K benchmark, with the xAI model trailing only offerings from Google, OpenAI, and Anthropic.

Notably, the only benchmark where Grok-1.5 appears to have an advantage is HumanEval, where it outperforms all models except Claude 3 Opus. xAI hopes to continue these improvements and provide further performance improvements with Grok-2, which according to Musk should surpass current AI in all metrics. The model is currently being trained.

Technical consultant Brian Roemmele said that based on his work with Grok-1, Grok-2 "will be one of the most powerful LLM AI platforms at launch. It will surpass OpenAI in almost every metric."

Availability of Grok-1.5

As for Grok-1.5, xAI plans to begin deployment next week. The company says the model will initially be available to early testers and those already using the Grok chatbot on Platform X (Twitter) - with real-time access to all posts on the platform. The rollout will be in phases, with the company improving the model and introducing several new features - possibly including a new untethered fun mode - while gradually making it available to a wider user base.

When Musk launched Grok on the X, it was seen as driving adoption of both Grok and the X. He's first offering AI as part of the platform's "Premium+" subscription, priced at $16 per month. However, just a few days ago, the billionaire shared that the chatbot will also be enabled for premium subscribers who pay $8 per month. In another update, he also confirmed that followers with a certain level of verified subscribers will receive the benefits of Premium and Premium+ subscriptions, including Grok, for free.

新聞

Musk launches Grok-1.5, close to GPT-4 level performance

What does Grok-1.5 bring?

Close to OpenAI and Anthropic

Availability of Grok-1.5

Hackers reveal vulnerability in "universal room card" that can open millions of hotel rooms around the world in seconds

Let’s talk about the love and hate of downtime deployment, blue-green deployment, rolling deployment and canary deployment