llm agent
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach Weiyu Ma
With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating L V5 build-in AI, showcasing effective strategy skills.
- Asia > South Korea (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (0.67)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Government > Military (1.00)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (4 more...)
- Information Technology > Security & Privacy (0.93)
- Health & Medicine (0.93)
Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation
This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks.
- North America > United States (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Retail (1.00)
- Consumer Products & Services > Restaurants (1.00)
- Information Technology (0.93)
- (2 more...)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Michigan (0.04)
- Asia > Singapore (0.04)
- (2 more...)
- Education > Educational Setting > Online (0.68)
- Information Technology > Services (0.46)
- Information Technology > Communications (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- (2 more...)
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
- North America > United States (0.04)
- Africa > South Africa > Gauteng > Johannesburg (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Michigan (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Government (0.67)
- Leisure & Entertainment > Games (0.46)
- (2 more...)
- North America > United States (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology (0.93)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)