Arthur, the machine learning monitoring startup, is capitalizing on the interest in generative AI. Introducing Arthur Bench, their new open source tool to help users find the best Language Model for specific data sets.
With the surge in interest around generative AI, Arthur is dedicated to enhancing products that work effectively with LLMs.
Arthur Bench addresses a common challenge—measuring the efficiency of different tools. In a rapidly evolving field, it's crucial to identify the right model for your application.
This tool offers a suite of methods to systematically test performance, focusing on gauging how prompts used by users in a specific context will fare with different LLMs.
CEO Adam Wenchel explains that you could compare various prompts and assess different LLMs, like Anthropic and OpenAI, on prompts relevant to your user base.
Arthur Bench is now open source, enabling anyone to utilize its capabilities. A SaaS version will cater to those seeking a simpler solution or larger-scale testing.
This follows the recent launch of Arthur Shield, an LLM firewall that identifies model hallucinations while safeguarding against harmful information leaks. Exciting developments in the AI landscape!