Inferring Capabilities from Task Performance with Bayesian Triangulation
Burden, John, Voudouris, Konstantinos, Burnell, Ryan, Rutar, Danaja, Cheke, Lucy, Hernández-Orallo, José
–arXiv.org Artificial Intelligence
As machine learning models become more general, we need to characterise them in richer, more meaningful ways. We describe a method to infer the cognitive profile of a system from diverse experimental data. To do so, we introduce measurement layouts that model how task-instance features interact with system capabilities to affect performance. These features must be triangulated in complex ways to be able to infer capabilities from non-populational data -- a challenge for traditional psychometric and inferential tools. Using the Bayesian probabilistic programming library PyMC, we infer different cognitive profiles for agents in two scenarios: 68 actual contestants in the AnimalAI Olympics and 30 synthetic agents for O-PIAAGETS, an object permanence battery. We showcase the potential for capability-oriented evaluation.
arXiv.org Artificial Intelligence
Sep-21-2023
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Austria > Vienna (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.81)
- Industry:
- Education (0.67)
- Health & Medicine (0.93)
- Technology: