AITopics | walkthrough

Collaborating Authors

walkthrough

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary material

Neural Information Processing SystemsOct-9-2025, 06:12:28 GMT

This section contains supplementary material to support the main paper text. The contents include: A. Videos illustrating our approach. Additional walkthrough collection details to supplement Sec. 4 (simulators). Alternate local state task formulations compared to ours in Sec. Additional attention visualization results to supplement Figure 1.

artificial intelligence, machine learning, video, (18 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

VideoGameBench: Can Vision-Language Models complete popular video games?

Zhang, Alex L., Griffiths, Thomas L., Narasimhan, Karthik R., Press, Ofir

arXiv.org Artificial IntelligenceJun-2-2025

Vision-language models (VLMs) have achieved strong results on coding and math benchmarks that are challenging for humans, yet their ability to perform tasks that come naturally to humans--such as perception, spatial navigation, and memory management--remains understudied. Real video games are crafted to be intuitive for humans to learn and master by leveraging innate inductive biases, making them an ideal testbed for evaluating such capabilities in VLMs. To this end, we introduce VideoGameBench, a benchmark consisting of 10 popular video games from the 1990s that VLMs directly interact with in real-time. VideoGameBench challenges models to complete entire games with access to only raw visual inputs and a high-level description of objectives and controls, a significant departure from existing setups that rely on game-specific scaffolding and auxiliary information. We keep three of the games secret to encourage solutions that generalize to unseen environments. Our experiments show that frontier vision-language models struggle to progress beyond the beginning of each game. We find inference latency to be a major limitation of frontier models in the real-time setting; therefore, we introduce VideoGameBench Lite, a setting where the game pauses while waiting for the LM's next action. The best performing model, Gemini 2.5 Pro, completes only 0.48% of VideoGameBench and 1.6% of VideoGameBench Lite. We hope that the formalization of the human skills mentioned above into this benchmark motivates progress in these research directions.

large language model, machine learning, reinforcement learning, (24 more...)

arXiv.org Artificial Intelligence

2505.18134

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models

Ding, Peng, Fang, Jiading, Li, Peng, Wang, Kangrui, Zhou, Xiaochen, Yu, Mo, Li, Jing, Walter, Matthew R., Mei, Hongyuan

arXiv.org Artificial IntelligenceMar-28-2024

Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks. In this paper, we propose MANGO, a benchmark to evaluate their capabilities to perform text-based mapping and navigation. Our benchmark includes 53 mazes taken from a suite of textgames: each maze is paired with a walkthrough that visits every location but does not cover all possible paths. The task is question-answering: for each maze, a large language model reads the walkthrough and answers hundreds of mapping and navigation questions such as "How should you go to Attic from West of House?" and "Where are we if we go north and east from Cellar?". Although these questions are easy to humans, it turns out that even GPT-4, the best-to-date language model, performs poorly at answering them. Further, our experiments suggest that a strong mapping and navigation ability would benefit large language models in performing relevant downstream tasks, such as playing textgames. Our MANGO benchmark will facilitate future research on methods that improve the mapping and navigation capabilities of language models. We host our leaderboard, data, code, and evaluation program at https://mango.ttic.edu and https://github.com/oaklight/mango/.

gpt-3, rf question, walkthrough, (15 more...)

arXiv.org Artificial Intelligence

2403.19913

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New Jersey (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Will GPT-4 Run DOOM?

de Wynter, Adrian

arXiv.org Artificial IntelligenceMar-8-2024

We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This large language model (LLM) is able to run and play the game with only a few instructions, plus a textual description--generated by the model itself from screenshots--about the state of the game being observed. We find that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. More complex prompting strategies involving multiple model calls provide better results. While further work is required to enable the LLM to play the game as well as its classical, reinforcement learning-based counterparts, we note that GPT-4 required no training, leaning instead on its own reasoning and observational capabilities. We hope our work pushes the boundaries on intelligent, LLM-based agents in video games. We conclude by discussing the ethical implications of our work.

agent, language model, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2403.05468

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(2 more...)

Genre: Research Report (0.41)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions

Yu, Mo, Gu, Yi, Guo, Xiaoxiao, Feng, Yufei, Zhu, Xiaodan, Greenspan, Michael, Campbell, Murray, Gan, Chuang

arXiv.org Artificial IntelligenceMay-26-2023

Commonsense reasoning simulates the human ability to make presumptions about our physical world, and it is an essential cornerstone in building general AI systems. We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs as human players demonstrate plentiful and diverse commonsense reasoning. The new dataset provides a natural mixture of various reasoning types and requires multi-hop reasoning. Moreover, the IF game-based construction procedure requires much less human interventions than previous ones. Different from existing benchmarks, our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge. Hence, in order to achieve higher performance on our tasks, models need to effectively utilize such functional knowledge to infer the outcomes of actions, rather than relying solely on memorizing facts. Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models with a significant 20% performance gap compared to human experts.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.15456

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions

Tsai, Chen Feng, Zhou, Xiaochen, Liu, Sierra S., Li, Jing, Yu, Mo, Mei, Hongyuan

arXiv.org Artificial IntelligenceApr-6-2023

Large language models (LLMs) such as ChatGPT and GPT-4 have recently demonstrated their remarkable abilities of communicating with human users. In this technical report, we take an initiative to investigate their capacities of playing text games, in which a player has to understand the environment and respond to situations by having dialogues with the game world. Our experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence. Precisely, ChatGPT can not construct the world model by playing the game or even reading the game manual; it may fail to leverage the world knowledge that it already has; it cannot infer the goal of each step as the game progresses.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.02868

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Texas (0.04)
North America > United States > New Jersey (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Chess (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Amazon.com: Practical MLOps: Operationalizing Machine Learning Models: 9781098103019: Gift, Noah, Deza, Alfredo: Books

#artificialintelligenceMar-24-2022, 00:19:54 GMT

The first few chapters cover the theory and practice of both DevOps and MLOps. One of the items covered is how to set up continuous integration and continuous delivery. Another critical topic is Kaizen, i.e., the idea of continuous improvement in everything. There are three chapters on cloud computing that cover AWS, Azure, and GCP. Alfredo, a developer advocate for Microsoft, is an ideal source of knowledge for MLOps on the Azure platform. Likewise, Noah has spent years getting students trained on cloud computing and working with the education arms of Google, AWS, and Azure.

mlop, operationalizing machine learning model, practical mlop, (6 more...)

#artificialintelligence

Genre: Summary/Review (0.62)

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.85)
Information Technology > Cloud Computing (0.82)

Add feedback

11 Ways to Learn More Data Science

#artificialintelligenceJan-9-2022, 08:59:47 GMT

I've been a teacher at many grade levels, and I own a tutoring center that serves kids from age 4 to 18. I've tutored hundreds of students myself over 10 years. I've spent a lot of time trying to teach concepts, to students, peers, friends, direct reports, you name it. I say this because there is one thing that I beg you to listen to, and it's the number one issue I've seen in students at all levels: We just don't know what we don't know. People aren't great at seeing where their own understanding has small gaps. For any topic, we have a few lines of knowledge that we can spout, but we just aren't aware of the edge cases that exist until we see them. We don't have all the knowledge of how every topic intersects with every related one, and many times, those answers are not easy to figure out. Therein lies why experience is valuable. There is so much about even the basic Data Science topics that we haven't yet come across.

business problem, data science, knowledge, (14 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (0.97)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.68)

Add feedback

5 Practical Data Science Projects That Will Help You Solve Real Business Problems for 2022 - KDnuggets

#artificialintelligenceDec-1-2021, 17:55:59 GMT

Recommendation systems are algorithms with an objective to suggest the most relevant information to users, whether that be similar products on Amazon, similar TV shows on Netflix, or similar songs on Spotify. There are two main types of recommendation systems: collaborative filtering and content-based filtering. Recommendation systems are one of the most widely used and most practical data science applications. Not only that, but it also has one of the highest ROIs when it comes to data products. It's estimated that Amazon increased its sales by 29% in 2019, specifically due to its recommendation system. As well, Netflix claimed that its recommendation system was worth a staggering $1 billion in 2016! But what makes it so profitable? As I alluded to earlier, it's about one thing: relevancy. By providing users with more relevant products, shows, or songs, you're ultimately increasing their likelihood to purchase more and/or stay engaged longer.

practical data science project, recommendation system, solve real business problem, (10 more...)

#artificialintelligence

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.77)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

Filters

Collaborating Authors

walkthrough

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

bd2605c5d854837aaf095537e82f1883-Supplemental-Conference.pdf

Supplementary material

VideoGameBench: Can Vision-Language Models complete popular video games?

MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models

Will GPT-4 Run DOOM?

JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions

Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions

Amazon.com: Practical MLOps: Operationalizing Machine Learning Models: 9781098103019: Gift, Noah, Deza, Alfredo: Books

11 Ways to Learn More Data Science

5 Practical Data Science Projects That Will Help You Solve Real Business Problems for 2022 - KDnuggets