MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft
Lin, Haowei, Wang, Zihao, Ma, Jianzhu, Liang, Yitao
–arXiv.org Artificial Intelligence
To pursue the goal of creating an open-ended agent in Minecraft, an open-ended game environment with unlimited possibilities, this paper introduces a task-centric framework named MCU for Minecraft agent evaluation. Within the MCU framework, each task is measured with six distinct difficulty scores (time consumption, operational effort, planning complexity, intricacy, creativity, novelty). These scores offer a multi-dimensional assessment of a task from different angles, and thus can reveal an agent's capability on specific facets. The difficulty scores also serve as the feature of each task, which creates a meaningful task space and unveils the relationship between tasks. For efficient evaluation of Minecraft agents employing the MCU framework, we maintain a unified benchmark, namely SkillForge, which comprises representative tasks with diverse categories and difficulty distribution. We also provide convenient filters for users to select tasks to assess specific capabilities of agents. We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent, and underscores the need for advancements in areas such as creativity, precise control, and out-of-distribution generalization under the goal of open-ended Minecraft agent development. In artificial intelligence (AI), an agent is a computer program or system that is designed to perceive its environment, make decisions and take actions to solve a specific task or set of tasks. On top of that, an open-ended agent is an agent that possesses the capabilities to solve arbitrary tasks that are feasible and can be solved by humans. The open-ended agent has crucial difference with task-specific agent or multi-task agent, which can only handle a limited spectrum of tasks.
arXiv.org Artificial Intelligence
Oct-12-2023
- Country:
- Asia > China (0.14)
- South America > Brazil (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Games > Computer Games (1.00)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence