Meta has announced a new AI model that links together multiple streams of data, including text, audio, visual data, temperature, and movement readings.
The model, ImageBind, is only a research project at this point, but it shows how future generative AI systems could create immersive, multisensory experiences.
Image generators like DALL-E and Midjourney rely on systems that link together text and images during the training stage, and Meta's ImageBind takes it a step further by combining six types of data into a single embedding space.
Imagine a futuristic virtual reality device that generates not only audio and visual input but also your environment and movement on a physical stage. ImageBind brings machines one step closer to humans' ability to learn simultaneously from many different forms of information.
Interestingly, Meta is open-sourcing the underlying model, despite concerns from some in the AI community that open-sourcing could be harmful to creators and potentially dangerous, allowing malicious actors to take advantage of state-of-the-art AI models.