Meta FAIR Releases Transfusion for Multimodal AI Training

7 months ago 95
  • Last updated August 29, 2024
  • In AI News

Transfusion is a state-of-the art approach at advancing text and image modalities

In a collaborated effort with Waymo & University of Southern California, Meta FAIR released its research on the importance of multi-modal generative models. Transfusion aims to unite and simplify the gap between discrete sequence modeling and continuous media generation. 

The Transfusion Model 

The model is trained equally on text and image. Per Meta, Transfusion is more advanced than quantising images and training a language model over discrete image tokens. The model’s performance can be enhanced through “modality-specific” encoding and decoding layers. The model predicts the next word in a sequence. Trained on improving predictions, it reduces the difference between guessing and actual words. It is imperative to note that with 7 billion parameters and 2 trillion multi modal tokens, Transfusion is at par with other larger models that create image and text – and outperforms models like DALL-E 2 and SDXL. It works better than Chameleon as it takes lesser computing power and generates better results. 

One limitation is perhaps that diffusion models do not perform at par with traditional language models. A lot of research is yet to be done in this area to improve overall performance.  

Transformer’s Uniqueness & the Future of Innovation in AI Research 

What differentiates Transformer from the rest is its unified architecture that runs end to end to generate text and images. Existing models like Flamingo, LLaVA, GILL, and DreamLLM combine separate architectures for different types of data, which are trained separately. 

The goal of this Transfusion is to synergise two modalities in a single joint model – with each of them fulfilling their objective. The incentives are that these are versatile, resource efficient, and cost effective for handling different types of data without any additional costs. 

Picture of Aditi Suresh

Aditi Suresh

Aditi is a political science graduate, and is interested in the intersection of technology and culture, and its impact on society.

Association of Data Scientists

Tailored Generative AI Training for Your Team

Upcoming Large format Conference

Sep 25-27, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

How is Linux Powering the AI Moment?

Sagar Sharma

NVIDIA has been using Ubuntu exclusively to demonstrate deep learning on all its edge solutions which suggests Linux performs better for deep learning tasks.

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

26 July 2024 | 583 Park Avenue, New York

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

September 25-27, 2024 | 📍Bangalore, India

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Read Entire Article