Apple shows off new AI model MGIE, which can refine pictures in just one sentence

2024.02.13

According to news on February 8, compared with Microsoft’s rapid rise, Apple’s layout in the field of AI is much more low-key, but this does not mean that Apple has not made any achievements in this field. Apple recently released a new open source artificial intelligence model called "MGIE" that can edit images based on natural language instructions.

Image source: VentureBeat in collaboration with Midjourney

The full name of MGIE is MLLM-Guided Image Editing, which uses multi-modal large language model (MLLM) to interpret user instructions and perform pixel-level operations. MGIE can understand natural language commands issued by users and perform operations such as Photoshop-style modifications, global photo optimization, and local editing.

Apple and UC Santa Barbara researchers are collaborating to publish MGIE-related research results at the 2024 International Conference on Learning Representations (ICLR), one of the top conferences for artificial intelligence research.

Before introducing MGIE, IT House will first introduce MLLM. MLLM is a powerful artificial intelligence model that can process text and images simultaneously, thereby enhancing instruction-based image editing capabilities. MLLMs have shown excellent capabilities in cross-modal understanding and visual perceptual response generation, but have not yet been widely used in image editing tasks.

MGIE integrates MLLMs into the image editing process in two ways: first, it uses MLLMs to derive expressive instructions from user input. The instructions are concise and provide clear guidance for the editing process.

For example, when inputting " make the sky bluer ", MGIE can generate the command " increase the saturation of the sky area by 20% ".

Second, it uses MLLM to generate visual imaginations, i.e., latent representations of the desired edits. This representation captures the essence of editing and can be used to guide pixel-level operations. MGIE employs a novel end-to-end training scheme that jointly optimizes instruction derivation, visual imagination, and image editing modules.

MGIE can handle a variety of editing situations, from simple color adjustments to complex object manipulation. The model can also perform global and local editing based on the user's preferences. Some of the features and functionality of MGIE include:

  • Instruction-based expression editing: MGIE can generate concise and clear instructions to effectively guide the editing process. This not only improves editing quality but also enhances the overall user experience.
  • Photoshop Style Editing: MGIE can perform common Photoshop style editing such as cropping, resizing, rotating, flipping and adding filters. The mockup can also apply more advanced edits, such as changing the background, adding or removing objects, and blending images.
  • Global Photo Optimization : MGIE can optimize the overall quality of your photos, such as brightness, contrast, sharpness, and color balance. The model can also apply artistic effects such as sketching, painting and caricature.
  • Local Editing: MGIE can edit specific areas or objects in an image, such as the face, eyes, hair, clothes, and accessories. The model can also modify the properties of these areas or objects, such as shape, size, color, texture, and style.

MGIE is an open source project on GitHub. Users can find code, data and pre-trained models here . The project also provides a demo notebook showing how to use MGIE to complete various editing tasks.