Stay Ahead of the Curve

Latest AI news, expert analysis, bold opinions, and key trends — delivered to your inbox.

Home Page » News » News » ViperGPT: A Framework for Composing Vision-and-Language Models for Complex Visual Queries

ViperGPT: A Framework for Composing Vision-and-Language Models for Complex Visual Queries

26 sec read Introducing ViperGPT, a framework for answering complex visual queries using code-generation models to compose vision-and-language models into subroutines. Achieves state-of-the-art results without further training. March 20, 2023 23:04

ViperGPT is a framework that leverages code-generation models to compose vision-and-language models into subroutines to answer complex visual queries. Unlike end-to-end models, ViperGPT explicitly differentiates between visual processing and reasoning, making it more interpretable and generalizable. It achieves state-of-the-art results across various complex visual tasks without requiring further training.

User Comments (0)

Add Comment

No comments added yet.

Add Comment

Your Name: *

Comment Title: *

Your E-mail: * We'll never share your email with anyone else.

Your Comment: *

Comments will not be approved to be posted if they are SPAM, abusive, off-topic, use profanity, contain a personal attack, or promote hate of any kind.