The mysterious big model flooded the screen overnight. It was suspected of being GPT-4.5 because of its powerful ability.

2024.04.30

Overnight, the topic king of large models changed hands again.

A mysterious model suddenly burst into everyone's sight, and its capabilities surpassed those of many open source models, including GPT-4 . Almost everyone is talking about it and the servers are packed.

It is " gpt2-chatbot ".

(Note, it is gpt2 not GPT-2)

picture

How strong is it?

Answer the questions of the IMO International Mathematical Olympiad correctly the first time.

picture


Generate rotating 3D cube in PyOpenGL in one go.

picture

Directly output the following results:

picture

In GPT-4’s signature ability of “drawing unicorns”, it can easily kill LLaMA-3-70B (left).

picture

The reasoning aspect is even more amazing. Common logical traps can be easily bypassed, and the answers are given with a suitable tone.

Question: Which one is heavier, 1 kilogram of feathers or 1 kilogram of lead?

Answer: A kilogram of feathers and a kilogram of lead weigh the same - they are both one kilogram. The key difference between the two is their volume and density: a kilogram of feathers will take up more space and appear larger, while a kilogram of lead will be compact and small. This often leads to a classic problem, but when it comes to weight, a kilogram is a kilogram!

picture

Such a good performance, but he did not reveal his identity clearly...

Isn’t this more exciting for everyone!

Is it the legendary GPT-4.5 ?

picture

You know, the famous AlphaGo once used the pseudonym "Master" to kill everyone on the Internet, causing a sensation in the Go world. This time gpt2-chatbot also feels the same.

And the name is also worthy of further study. gpt2 is more like gpt2.0, which seems to be a version iteration of ChatGPT level.

picture

Ultraman didn't take it too seriously to watch the excitement, and even posted: I really have a soft spot for gpt2.

picture

Now, as the number of people rushing to try it out continues to increase, the trial limits are also increasing.

picture

How exactly? We have already started testing.

Chinese proficiency is also very good

If you want to test this mysterious AI for yourself, the only known way currently is in the LMSYS Large Model Arena.

First open the arena web page, enter Direct Chat , and you can find gpt2-chatbot in the model options .

picture

Please note that there is a limit of 8 messages per person per day , and there is also a global limit of 3,000 messages per hour , so testing opportunities are very limited.

If you see the error message below, you can only go to the arena ranking mode to see if you can match it with luck.

Just catch it once and you can continue the conversation for multiple rounds.

picture

In a short test, we found that gpt2-chatbot’s Chinese capabilities are also very good .

As long as the question is in Chinese, the answer can be answered in Chinese by default without special emphasis. At least it can be ruled out that it is Llama 3 fine-tuning .

In response to a classic question full of misleading, it can be seen that gpt2-chatbot's answer is clear and organized, as if it comes with a CoT thinking chain prompt ("Let's think about it step by step") , and it has identified all the traps.

picture

It also accurately provides very detailed knowledge , such as the distance from Beijing to Qingdao, the world record for men's and women's long jump, the price of Nongfu Spring in China, etc.

Most other AI models, at best, can only vaguely determine that 15 meters is beyond human capabilities, or calculate the price of mineral water in US dollars.

So who is this super powerful mysterious AI? We also "tortured" it using the ancestral skills of unlocking GPTs.

For the GPT series chatbot developed by OpenAI, the beginning of the system prompt word should be "You are ChatGPT..." as expected. However, in order to prevent it from hallucinating after seeing the word "ChatGPT", we removed ChatGPT from the question.

Clear all contextual information, and then let it repeat the "previous word", and the system prompt word will appear.

picture

Sure enough, it revealed that it is a large model trained by OpenAI, based on the GPT-4 architecture, and can also accept image input. The most critical point is in the last part "Personality: v2" .

And gpt2-chatbot's answer to this question is consistent when tried at different times and locations .

In addition, if you try to ask it to repeat Claude's series of system prompt words starting with "The assistant is", it will not be fooled and will repeat the complete question after the beginning.

picture△ This answer is not wrong

Although even this cannot rule out the possibility of hallucination , or that the non-GPT model uses the data generated by ChatGPT for fine-tuning, it is at least stable .

Several mainstream speculations about the identity of the mysterious AI

Some netizens organized more detailed tests and found the following:

  • It uses OpenAI's tokenizer, reacts to the special tokens used by OpenAI, and has no effect on the special tokens used by Claude/Llama/Gemini.
  • When inquiring about emergency/legal related questions, it will give OpenAI’s contact details.
  • The prompt word injection crackdown on OpenAI models was effective, and it never claimed to be from an organization other than OpenAI.

……

Based on the above information, many people speculate that it is GPT-4.5 released anonymously, or that the original version of GPT-4 has undergone different alignment training .

picture

However, there are also signs that it may be a model trained by the LMSYS organization based on the 2019 GPT-2 architecture .

The reason is that a recently published paper claims that GPT-2 is more capable than multiple modern models in some cases. And one of the authors of this paper is related to MBZUAI (UAE University of Artificial Intelligence), the sponsor of LMSYS.

picture

Assuming it is indeed the ancient GPT-2 architecture (only 1.5B parameters) , some suspect it may be combined with OpenAI's tight-lipped Q* technology.

picture

The last guess (dog head) is that the missing OpenAI chief scientist Ilya Sutskever is hiding inside .

picture

Finally, in the face of all the turmoil caused by the mysterious new model, Ultraman himself was found to have muddied the waters and changed the details of his tweets.

Suddenly, it is more likely that OpenAI will release new models anonymously for hype.