Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
Effortlessly deploy 500+ tools to any Linux system with a single curl command. No root, no mess, no fuss.
Open Leaderboards. Trustworthy Evaluation. Robust AI Detection. RAID is the largest & most comprehensive dataset for evaluating AI-generated text detectors. It contains over 10 million documents ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results