ViperGPT: A Framework for Composing Vision-and-Language Models for Complex Visual Queries

26 sec read Introducing ViperGPT, a framework for answering complex visual queries using code-generation models to compose vision-and-language models into subroutines. Achieves state-of-the-art results without further training. March 20, 2023 23:04 ViperGPT: A Framework for Composing Vision-and-Language Models for Complex Visual Queries

ViperGPT is a framework that leverages code-generation models to compose vision-and-language models into subroutines to answer complex visual queries. Unlike end-to-end models, ViperGPT explicitly differentiates between visual processing and reasoning, making it more interpretable and generalizable. It achieves state-of-the-art results across various complex visual tasks without requiring further training.

User Comments (0)

Add Comment
We'll never share your email with anyone else.

img