AI horror gymnastics video with legs and feet flying around, LeCun: Video generation model doesn't understand physics at all

2024.07.02

An AI-generated gymnastics video attracted nearly one million viewers, and LeCun and other bigwigs even had a fight over it.

Gymnastics performance, emmmm, why doesn’t it count?

Judging from the watermark in the upper right corner of the video, this video was generated by the Dream Machine (from Luma AI), which was once considered to be the "next generation" of Vincent video .

Everyone couldn't sit still after watching it. The discussion surrounding this was a familiar topic in the field of AI videos: whether AI understands the laws of physics .

LeCun spoke directly:

Video generation models don’t understand basic physics. Let alone the human body.

picturepicture

Pedro Domingos, a professor of computer science at the University of Washington, also shook his head after reading it:

AGI may not be as imminent as some people expect.

picturepicture

Abnormal bird food is outrageous

Since Sora came out, the topic of "whether AI understands the laws of physics" has attracted more and more attention.

The following "night scene of a hermit crab using a light bulb as its shell" generated by Sora is a classic example. The interaction between the waves and the beach is very delicate, and the cilia on the hermit crab's legs are also vivid.

picturepicture

Compared with real photos of similar scenes, the only obvious flaw is that the light bulb should not light up because it has no power.

picturepicture

The same is true of Luma AI's Dream Machine, which recently generated a first-person perspective of an abandoned house with full realism:

picturepicture

Therefore, many people believe that the video generation models of Sora, LUMA, etc. have understood simple physical laws.

However, the video released this time is really too outrageous.

Not only did his legs and feet fly around, he also frequently performed miracles:

picturepicture

Even this difficult mid-air somersault would make Newton angry:

picturepicture

So much so that after watching it, netizens said that there was no need to call it scary, it was more like funny.

picturepicture

It is so abstract that LeCun directly commented that video generation models do not understand physics.

He further explained that Sora or other video generation models have similar problems, and video generation technology will undoubtedly improve over time.

but:

A learning system that truly understands physics will not be generative . Just like birds, mammals, etc. understand physics better than any video generation system. Yet none of them can generate detailed videos.

picturepicture

There is another similar thought:

Even if the AI ​​video generation model evolves well in the future and the quality of the generated videos is "perfect", does that mean it understands physics?

picturepicture

LeCun and others’ opinions immediately aroused doubts from netizens:

Birds and mammals also produce detailed videos, but they do so in a way that their brains cannot visualize them.

picturepicture

However, this rebuttal did not convince LeCun.

picturepicture

In addition, there are many people who hold opposing views.

For example, Lucas Beyer, a researcher at Google's DeepMind/Brain team, pointed out:

It's like showing an image generated by a Dall·E mini from a few years ago and then saying that current image generation methods are doomed to fail.

After all, the images generated by the previous raw image model were like:

picturepicture

As for why the model would generate such an outrageous video?

Some netizens believe that it is due to the lack of gymnastics performance data, while others believe that the blurred processing of body parts makes it impossible for the model to understand the human body structure and thus cannot guarantee the continuity of limb movements.

picturepicture

Video generation is computationally more complex and highly context-dependent, placing greater demands on carefully annotated training data, needs that are currently underserved.

picturepicture

Some time ago, SD 3 crashed and the human body generation effect was also poor. Netizens also discussed this issue. Overly strict data review may have mistakenly deleted some harmless adult images , affecting the model's understanding of the human body structure.

picturepicture

One More Thing

In addition to the gymnastics video generated by Luma AI's Dream Machine, Runway's Gen-3 also...

picturepicture

The same model with three heads and six arms:      

picturepicture

The same aerial suspension skills:

picturepicture

Reference links:
[1]https://x.com/ylecun/status/1807497091964449266
[2]https://x.com/giffmana/status/1807511985807908926
[3]https://x.com/EricDai_BioE/status/1807540558216454281
[4]https://x.com/Grady_Booch/status/1807556807982010451