A python framework that combines vision and language models using code generation to achieve state-of-the-art results