voyager
MINDSTORES: Memory-Informed Neural Decision Synthesis for Task-Oriented Reinforcement in Embodied Systems
Chari, Anirudh, Reddy, Suraj, Tiwari, Aditya, Lian, Richard, Zhou, Brian
While large language models (LLMs) have shown promising capabilities as zero-shot planners for embodied agents, their inability to learn from experience and build persistent mental models limits their robustness in complex open-world environments like Minecraft. We introduce MINDSTORES, an experience-augmented planning framework that enables embodied agents to build and leverage mental models through natural interaction with their environment. Drawing inspiration from how humans construct and refine cognitive mental models, our approach extends existing zero-shot LLM planning by maintaining a database of past experiences that informs future planning iterations. The key innovation is representing accumulated experiences as natural language embeddings of (state, task, plan, outcome) tuples, which can then be efficiently retrieved and reasoned over by an LLM planner to generate insights and guide plan refinement for novel states and tasks. Through extensive experiments in the MineDojo environment, a simulation environment for agents in Minecraft that provides low-level controls for Minecraft, we find that MINDSTORES learns and applies its knowledge significantly better than existing memory-based LLM planners while maintaining the flexibility and generalization benefits of zero-shot approaches, representing an important step toward more capable embodied AI systems that can learn continuously through natural experience.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (4 more...)
- Research Report (0.83)
- Workflow (0.69)
- Leisure & Entertainment > Games > Computer Games (0.96)
- Leisure & Entertainment > Sports > Golf (0.93)
- Materials > Metals & Mining > Iron (0.69)
MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning
Lică, Mircea, Shirekar, Ojas, Colle, Baptiste, Raman, Chirag
Contemporary embodied agents, such as Voyager in Minecraft, have demonstrated promising capabilities in open-ended individual learning. However, when powered with open large language models (LLMs), these agents often struggle with rudimentary tasks, even when fine-tuned on domain-specific knowledge. These advancements enable agents to reason about their and others' mental states, empirically addressing two prevalent failure modes: false beliefs and faulty task executions. The development of generally capable agents marks a significant shift in advancing artificial intelligence, transitioning from assimilating data to generating novel knowledge through embodied interactions with open-ended environments (Kolve et al., 2017; Savva et al., 2019; Puig et al., 2018; Shridhar et al., 2020). Classical approaches leveraging reinforcement learning (Schulman et al., 2017; Hafner et al., 2023) and imitation learning (Zare et al., 2024) often struggle with generalization and exploration, as agents tend to converge on repetitive behaviors in static environments (Cobbe et al., 2019). To address these limitations, researchers have sought to emulate human-like lifelong learning capabilities, developing systems that can continuously acquire, update, and transfer knowledge over extended periods (Parisi et al., 2019; Wang et al., 2023b).The advent of large language models (LLMs) has accelerated this pursuit, enabling the development of agents such as Voyager (Wang et al., 2023a) that can apply internet-scale knowledge to continuously explore, plan, and acquire new skills in partially observable, open-ended environments such as Minecraft. Despite their promise, we argue that state-of-the-art lifelong learning agents like Voyager face a crucial limitation: they learn in isolation, neglecting a fundamental aspect of human intelligence--the social context. So central is the social context to our existence, that the Social Intelligence Hypothesis posits that our cognitive capabilities evolved primarily to navigate the complexities of social life (Humphrey, 1976; Dunbar, 1998). This isolated learning becomes particularly problematic when coupled with these agents' reliance on closed LLM) like GPT-4. Wang et al. (2023a) note that "VOYAGER requires Hey! I need help with Sure!
- Overview (0.46)
- Research Report (0.40)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games > Computer Games (0.71)
- Education > Educational Setting > Continuing Education (0.54)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)
ADAM: An Embodied Causal Agent in Open-World Environments
In open-world environments like Minecraft, existing agents face challenges in continuously learning structured knowledge, particularly causality. These challenges stem from the opacity inherent in black-box models and an excessive reliance on prior knowledge during training, which impair their interpretability and generalization capability. To this end, we introduce ADAM, An emboDied causal Agent in Minecraft, that can autonomously navigate the open world, perceive multimodal contexts, learn causal world knowledge, and tackle complex tasks through lifelong learning. ADAM is empowered by four key components: 1) an interaction module, enabling the agent to execute actions while documenting the interaction processes; 2) a causal model module, tasked with constructing an ever-growing causal graph from scratch, which enhances interpretability and diminishes reliance on prior knowledge; 3) a controller module, comprising a planner, an actor, and a memory pool, which uses the learned causal graph to accomplish tasks; 4) a perception module, powered by multimodal large language models, which enables ADAM to perceive like a human player. Extensive experiments show that ADAM constructs an almost perfect causal graph from scratch, enabling efficient task decomposition and execution with strong interpretability. Notably, in our modified Minecraft games where no prior knowledge is available, ADAM maintains its performance and shows remarkable robustness and generalization capability. ADAM pioneers a novel paradigm that integrates causal methods and embodied agents in a synergistic manner. Our project page is at https://opencausalab.github.io/ADAM.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)
The Embodied World Model Based on LLM with Visual Information and Prediction-Oriented Prompts
Haijima, Wakana, Nakakubo, Kou, Suzuki, Masahiro, Matsuo, Yutaka
In recent years, as machine learning, particularly for vision and language understanding, has been improved, research in embedded AI has also evolved. VOYAGER is a well-known LLM-based embodied AI that enables autonomous exploration in the Minecraft world, but it has issues such as underutilization of visual data and insufficient functionality as a world model. In this research, the possibility of utilizing visual data and the function of LLM as a world model were investigated with the aim of improving the performance of embodied AI. The experimental results revealed that LLM can extract necessary information from visual data, and the utilization of the information improves its performance as a world model. It was also suggested that devised prompts could bring out the LLM's function as a world model.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Leisure & Entertainment > Games > Computer Games (0.36)
- Materials > Metals & Mining > Gold (0.31)
AI can help shape society for the better – but humans and machines must work together D Fox Harrell
One of the first images of AI I encountered was a white, spectral, hostile, disembodied head. It was in the computer game Neuromancer, programmed by Troy Miles and based on William Gibson's cyberpunk novel. Other people may have first encountered HAL 9000 from Stanley Kubrik's 2001: A Space Odyssey or Samantha from Spike Jonze's Her. Images from pop culture influence people's impressions of AI, but culture has an even more profound relationship to it. If there's one thing to take away from this article, it is the idea that AI systems are not objective machines, but instead based in human culture: our values, norms, preferences, and behaviours in society.
- North America > United States > Massachusetts (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > El Salvador (0.05)
- (2 more...)
- Media > Music (0.76)
- Leisure & Entertainment > Games > Computer Games (0.35)
They Plugged GPT-4 Into Minecraft--and Unearthed New Potential for AI
The technology that underpins ChatGPT has the potential to do much more than just talk. Linxi "Jim" Fan, an AI researcher at the chipmaker Nvidia, worked with some colleagues to devise a way to set the powerful language model GPT-4--the "brains" behind ChatGPT and a growing number of other apps and services--loose inside the blocky video game Minecraft. The Nvidia team, which included Anima Anandkumar, the company's director of machine learning and a professor at Caltech, created a Minecraft bot called Voyager that uses GPT-4 to solve problems inside the game. The language model generates objectives that help the agent explore the game, and code that improves the bot's skill at the game over time. Voyager doesn't play the game like a person, but it can read the state of the game directly, via an API.
. . . And the Computer Plays Along
A concert held at the Massachussetts Institute of Technology (MIT) in the fall to celebrate the opening of the university's new museum included a performer that was invisible to the audience but played a key role in forming the melodic sound: an artificial intelligence (AI) system that responded to the musicians and improvised in real time. In a piece from "Brain Opera 2.0," the system starts by growling to the trumpet, then finds pitches with the trombone, becomes melodic with the sax, and ultimately syncs with the instruments by the time everyone comes in, explains Tod Machover, a music and media professor at MIT and head of the MIT Media Lab, who served as composer/conductor of the two-night concert event. The "living, singing AI" system was designed by Manaswi Mishra, one of Machover's Ph.D. students. "We developed a machine learning-based model that could react to musician input in real time, and then'fed' this model with a vast amount of music from many countries, styles, and historic periods, as well as with all kinds of human voices making every conceivable kind of vocal sound," Machover said. The system also drew from a vast library of percussive instruments and sounds from around the world to then improvise with the performers.
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California (0.04)
- (3 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
TechScape: can AI really predict crime?
In 2011, the Los Angeles police department rolled out a novel approach to policing called Operation Laser. Laser – which stood for Los Angeles Strategic Extraction and Restoration – was the first predictive policing programme of its kind in the US, allowing the LAPD to use historical data to predict with laser precision (hence the name) where future crimes might be committed and who might commit them. But it was all but precise. The programme used historical crime data like arrests, calls for service, field interview cards – which police filled out with identifying information every time they stopped someone regardless of the reason – and more to map out "problem areas" for officers to focus their efforts on or assign criminal risk scores to individuals. Information collected during these policing efforts was fed into computer software that further helped automate the department's crime-prediction efforts.
- North America > United States > California > Los Angeles County > Los Angeles (0.76)
- North America > United States > Texas (0.05)
- North America > United States > New York (0.05)
What rights does an evil sentient computer have on Star Trek?
This post contains major spoilers for season two, episode seven of'Star Trek: Lower Decks.' Artificial intelligence has been baked into the Star Trek universe since the original series. Kirk and his crew occasionally faced off against computers gone amok, including Nomad, Landru and the M-5. The only way to defeat these digital villains was to outwit them using logic, which caused them to self-destruct. But in The Next Generation, the franchise became more interested in exploring the personhood of artificial beings like Data and his family, Voyager's holographic doctor or the exocomps. This week, Lower Decks dredges up the old-style megalomaniacal AI and asks, are you really sure about those rights?
Space, the final frontier for angry teens in 'Voyagers'
From writer-director Neil Burger ("Divergent") comes another young adult science-fiction tale, this one of a cruise ship in deep space full of restless teenagers under the supervision of a single adult. Some of the young people find out that the adult is keeping them drugged and docile and forcing them to reproduce artificially. Is that a recipe for YA trouble or what? Just when you thought you could not watch one more film of this kind, here is "Voyagers," a title that sounds enough like "Passengers" (2016) to put you off you spaceship-grown peas and carrots. The story is set in 2063 when Earth is ravaged, and scientists have searched for another planet to colonize.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Transportation (0.79)