MIRAI: Evaluating LLM Agents for Event Forecasting
Ye, Chenchen, Hu, Ziniu, Deng, Yihe, Huang, Zijie, Ma, Mingyu Derek, Zhu, Yanqiao, Wang, Wei
–arXiv.org Artificial Intelligence
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents' forecasting capability and reliability. To address this gap, we introduce MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. Our benchmark features an agentic environment with tools for accessing an extensive database of historical, structured events and textual news articles. We refine the GDELT event database with careful cleaning and parsing to curate a series of relational prediction tasks with varying forecasting horizons, assessing LLM agents' abilities from short-term to long-term forecasting. We further implement APIs to enable LLM agents to utilize different tools via a code-based interface. In summary, MIRAI comprehensively evaluates the agents' capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events. Through comprehensive benchmarking, we aim to establish a reliable framework for assessing the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.
arXiv.org Artificial Intelligence
Jul-1-2024
- Country:
- Africa
- Togo (0.04)
- Mayotte (0.04)
- Lesotho (0.04)
- Mali (0.04)
- Sierra Leone (0.04)
- Burkina Faso (0.04)
- Ethiopia (0.04)
- Sudan (0.04)
- Kenya (0.04)
- Cabo Verde (0.04)
- South Africa (0.04)
- Saint Helena, Ascension and Tristan da Cunha (0.04)
- Botswana (0.04)
- The Gambia (0.04)
- Nigeria (0.04)
- Central African Republic (0.04)
- Ghana (0.04)
- Eswatini (0.04)
- Namibia (0.04)
- Middle East
- Zimbabwe (0.04)
- Eritrea (0.04)
- Seychelles (0.04)
- Republic of the Congo (0.04)
- Mauritania (0.04)
- Mozambique (0.04)
- Uganda (0.04)
- Gabon (0.04)
- Western Sahara (0.04)
- Tanzania (0.04)
- South Sudan (0.04)
- Equatorial Guinea (0.04)
- Madagascar (0.04)
- Malawi (0.04)
- Liberia (0.04)
- São Tomé and Príncipe (0.04)
- Angola (0.04)
- Senegal (0.04)
- Cameroon (0.04)
- Burundi (0.04)
- Guinea-Bissau (0.04)
- Democratic Republic of the Congo (0.04)
- Mauritius (0.04)
- Comoros (0.04)
- Benin (0.04)
- French Southern and Antarctic Lands (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Zambia (0.04)
- Côte d'Ivoire (0.04)
- Antarctica
- Bouvet Island (0.04)
- French Southern Territories (0.04)
- French Southern and Antarctic Lands (0.04)
- Asia
- Pakistan (0.04)
- Brunei (0.04)
- Cambodia (0.04)
- Nepal (0.04)
- Mongolia (0.04)
- Malaysia (0.04)
- Macao (0.04)
- North Korea (0.14)
- Kazakhstan (0.04)
- Uzbekistan (0.04)
- Bangladesh (0.04)
- Bhutan (0.04)
- Azerbaijan (0.04)
- Vietnam (0.04)
- Japan (0.04)
- Indonesia (0.04)
- Laos (0.04)
- British Indian Ocean Territory (0.04)
- Middle East
- Maldives (0.04)
- Philippines (0.04)
- Russia (0.04)
- China
- Timor-Leste (0.04)
- Armenia (0.04)
- South Korea (0.04)
- Taiwan (0.04)
- Myanmar (0.04)
- Kyrgyzstan (0.04)
- Sri Lanka (0.04)
- Turkmenistan (0.04)
- Thailand (0.04)
- Singapore (0.04)
- Tajikistan (0.04)
- India (0.04)
- Afghanistan (0.04)
- Europe
- Belarus (0.04)
- North Macedonia (0.04)
- Hungary (0.04)
- Portugal (0.04)
- Moldova (0.04)
- Ireland (0.04)
- Kosovo (0.04)
- Gibraltar (0.04)
- Czechia (0.04)
- Sweden (0.04)
- Croatia (0.04)
- Ukraine (0.04)
- Belgium (0.04)
- Norway > Svalbard and Jan Mayen (0.04)
- Romania (0.04)
- Estonia (0.04)
- Lithuania (0.04)
- Isle of Man (0.04)
- Latvia (0.04)
- Holy See > Vatican City (0.04)
- Middle East
- Greece (0.04)
- San Marino (0.04)
- Russia (0.04)
- Italy (0.04)
- France (0.04)
- Serbia (0.04)
- Slovenia (0.04)
- Slovakia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Monaco (0.04)
- Albania (0.04)
- Finland (0.04)
- Denmark (0.04)
- Switzerland (0.04)
- Iceland (0.04)
- Netherlands (0.04)
- Spain (0.04)
- Germany (0.04)
- Poland (0.04)
- Andorra (0.04)
- Liechtenstein (0.04)
- Bulgaria (0.04)
- Austria (0.04)
- Bosnia and Herzegovina (0.04)
- Faroe Islands (0.04)
- Montenegro (0.04)
- Indian Ocean (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Netherlands > Sint Eustatius (0.04)
- Sint Maarten (0.04)
- Turks and Caicos Islands (0.04)
- Cuba (0.04)
- Haiti (0.04)
- The Bahamas (0.04)
- Saint Lucia (0.04)
- El Salvador (0.04)
- United States
- California > Los Angeles County
- Los Angeles (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Los Angeles County
- Jamaica (0.04)
- Mexico (0.04)
- Barbados (0.04)
- Guadeloupe (0.04)
- Dominica (0.04)
- Saint Barthélemy (0.04)
- Anguilla (0.04)
- Guatemala (0.04)
- Nicaragua (0.04)
- Saint Pierre and Miquelon > Miquelon-Langlade
- Miquelon (0.04)
- Belize (0.04)
- Honduras (0.04)
- Antigua and Barbuda (0.04)
- Dominican Republic (0.04)
- Greenland (0.04)
- Martinique (0.04)
- British Virgin Islands (0.04)
- Bermuda (0.04)
- Puerto Rico (0.04)
- Costa Rica (0.04)
- Saint Martin (0.04)
- Aruba (0.04)
- Panama (0.04)
- Cayman Islands (0.04)
- Montserrat (0.04)
- Trinidad and Tobago (0.04)
- Bonaire, Sint Eustatius and Saba (0.04)
- Curaçao (0.04)
- Canada > Ontario
- Oceania
- Australia
- American Samoa (0.04)
- Kiribati (0.04)
- New Zealand (0.04)
- Wallis and Futuna (0.04)
- Guam (0.04)
- Fiji (0.04)
- Cook Islands (0.04)
- Marshall Islands (0.04)
- Tuvalu (0.04)
- Samoa (0.04)
- Solomon Islands (0.04)
- Tonga (0.04)
- New Caledonia (0.04)
- Papua New Guinea (0.04)
- Nauru (0.04)
- Tokelau (0.04)
- Micronesia (0.04)
- Pitcairn (0.04)
- Palau (0.04)
- Northern Mariana Islands (0.04)
- French Polynesia (0.04)
- Niue (0.04)
- Vanuatu (0.04)
- South America
- Africa
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Government
- Foreign Policy (1.00)
- Military (0.93)
- Information Technology (0.92)
- Law (1.00)
- Government
- Technology: