With Evals, OpenAI hopes to crowdsource AI model testing

Mar-14-2023, 20:25:32 GMT–#artificialintelligence

Alongside GPT-4, OpenAI has open-sourced a software framework to evaluate the performance of its AI models. Called Evals, OpenAI says that the tooling will allow anyone to report shortcomings in its models to help guide improvements. It's a sort of crowdsourcing approach to model testing, OpenAI explains in a blog post. "We use Evals to guide development of our models (both identifying shortcomings and preventing regressions), and our users can apply it for tracking performance across model versions and evolving product integrations," OpenAI writes. "We are hoping Evals becomes a vehicle to share and crowdsource benchmarks, representing a maximally wide set of failure modes and difficult tasks." OpenAI created Evals to develop and run benchmarks for evaluating models like GPT-4 while inspecting their performance.

crowdsource ai model testing, eval, openai, (3 more...)

#artificialintelligence

Mar-14-2023, 20:25:32 GMT

News Web Page

Add feedback

Country:
- North America > United States > Maryland (0.07)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (1.00)