Goto

Collaborating Authors

 magenta


One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning

Guo, Zijian, Işık, İlker, Ahmad, H. M. Sabbir, Li, Wenchao

arXiv.org Artificial Intelligence

Generalizing to complex and temporally extended task objectives and safety constraints remains a critical challenge in reinforcement learning (RL). Linear temporal logic (LTL) offers a unified formalism to specify such requirements, yet existing methods are limited in their abilities to handle nested long-horizon tasks and safety constraints, and cannot identify situations when a subgoal is not satisfiable and an alternative should be sought. In this paper, we introduce GenZ-LTL, a method that enables zero-shot generalization to arbitrary LTL specifications. GenZ-LTL leverages the structure of Büchi automata to decompose an LTL task specification into sequences of reach-avoid subgoals. Contrary to the current state-of-the-art method that conditions on subgoal sequences, we show that it is more effective to achieve zero-shot generalization by solving these reach-avoid problems \textit{one subgoal at a time} through proper safe RL formulations. In addition, we introduce a novel subgoal-induced observation reduction technique that can mitigate the exponential complexity of subgoal-state combinations under realistic assumptions. Empirical results show that GenZ-LTL substantially outperforms existing methods in zero-shot generalization to unseen LTL specifications.


Why do objects have many names? A study on word informativeness in language use and lexical systems

Gualdoni, Eleonora, Boleda, Gemma

arXiv.org Artificial Intelligence

Human lexicons contain many different words that speakers can use to refer to the same object, e.g., "purple" or "magenta" for the same shade of color. On the one hand, studies on language use have explored how speakers adapt their referring expressions to successfully communicate in context, without focusing on properties of the lexical system. On the other hand, studies in language evolution have discussed how competing pressures for informativeness and simplicity shape lexical systems, without tackling in-context communication. We aim at bridging the gap between these traditions, and explore why a soft mapping between referents and words is a good solution for communication, by taking into account both in-context communication and the structure of the lexicon. We propose a simple measure of informativeness for words and lexical systems, grounded in a visual space, and analyze color naming data for English and Mandarin Chinese. We conclude that optimal lexical systems are those where multiple words can apply to the same referent, conveying different amounts of information. Such systems allow speakers to maximize communication accuracy and minimize the amount of information they convey when communicating about referents in contexts.


COMMA: A Communicative Multimodal Multi-Agent Benchmark

Ossowski, Timothy, Chen, Jixuan, Maqbool, Danyal, Cai, Zefan, Bradshaw, Tyler, Hu, Junjie

arXiv.org Artificial Intelligence

The rapid advances of multi-modal agents built on large foundation models have largely overlooked their potential for language-based communication between agents in collaborative tasks. This oversight presents a critical gap in understanding their effectiveness in real-world deployments, particularly when communicating with humans. Existing agentic benchmarks fail to address key aspects of inter-agent communication and collaboration, particularly in scenarios where agents have unequal access to information and must work together to achieve tasks beyond the scope of individual capabilities. To fill this gap, we introduce a novel benchmark designed to evaluate the collaborative performance of multimodal multi-agent systems through language communication. Our benchmark features a variety of scenarios, providing a comprehensive evaluation across four key categories of agentic capability in a communicative collaboration setting. By testing both agent-agent and agent-human collaborations using open-source and closed-source models, our findings reveal surprising weaknesses in state-of-the-art models, including proprietary models like GPT-4o. These models struggle to outperform even a simple random agent baseline in agent-agent collaboration and only surpass the random baseline when a human is involved.


DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications

Jackermeier, Mathias, Abate, Alessandro

arXiv.org Artificial Intelligence

Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not observed during training remains a challenging problem. Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of Büchi automata, which explicitly represent the semantics of LTL specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae. Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite-and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency. One of the fundamental challenges in artificial intelligence (AI) is to create agents capable of following arbitrary instructions. While significant research efforts have been devoted to designing reinforcement learning (RL) agents that can complete tasks expressed in natural language (Oh et al., 2017; Goyal et al., 2019; Luketina et al., 2019), recent years have witnessed increased interest in formal languages to specify tasks in RL (Andreas et al., 2017; Camacho et al., 2019; Jothimurugan et al., 2021). Formal specification languages offer several desirable properties over natural language, such as well-defined semantics and compositionality, allowing for the specification of unambiguous, structured tasks (Vaezipoor et al., 2021; León et al., 2022). Recent works have furthermore shown that it is possible to automatically translate many natural language instructions into a relevant specification language, providing interpretable yet precise representations of tasks, which is especially important in safety-critical domains (León et al., 2021; Pan et al., 2023; Liu et al., 2023; Cohen et al., 2024). Linear temporal logic (LTL) (Pnueli, 1977) in particular has been adopted as a powerful formalism for instructing RL agents (Hasanbeig et al., 2018; Araki et al., 2021; Voloshin et al., 2023). LTL is an appealing specification language that allows for the definition of tasks in terms of high-level features of the environment.


How Color is Represented and Viewed in Computer Vision

#artificialintelligence

The eye is such a beautiful creation of the creators, which can perceive the color of an object in an astatically pleasing and harmonious way. Color Models are important for digital visualization.


Soft robotic device stimulates muscles, sparks hope for ALS and MS patients

Engadget

Today, muscle atrophy is often unavoidable when you can't move due to severe injury, old age or diseases like amyotrophic lateral sclerosis (ALS) and multiple sclerosis (MS). However, Harvard researchers see hope in soft robotics that could someday stretch and contract the muscles of patients unable to do so themselves. The Harvard engineers tested a new mechanostimulation system on mice, successfully preventing or assisting in their recovery from muscle atrophy. The team implanted the "soft robotic device" on a mouse's hind limb, which they immobilized in a cast-like enclosure for around two weeks. While the control group's untreated muscles wasted away as expected, the actively stimulated muscles showed reduced degradation.


Google Brain wants creative AI to help humans make "a new kind of art"

#artificialintelligence

Machine-learning algorithms aren't likely to put painters or singer-songwriters out of work anytime soon, to judge from their body of work to date. But Google Brain is developing tools that pair artists with deep-learning tools to develop novel artwork together, said Douglas Eck, senior staff scientist at the search giant's artificial-intelligence research division, during the MIT Technology Review's EmTech Digital conference on Tuesday. He hopes the platform, called Magenta, will allow people to produce completely new kinds of music and art, in much the way that keyboards, drum machines, and cameras did. Eck said that Magenta could serve a role analogous to that of Les Paul, who helped develop the modern electric guitar. But Eck said they want to keep artists in the loop to push the boundaries of the new tool in interesting ways, like a Jimi Hendrix who flips it upside down, bends the strings, and distorts the sound.


Smells like team spirit: Getting 'art' out of artificial intelligence

#artificialintelligence

Well, now we do not have to only imagine it. A project called Lost Tapes of the 27 Club, focused on mental health in the music industry, recently released a song called Drowned in the Sun. It was touted as a never-heard-before Nirvana song. Except that this song was never written by Kurt Cobain or Nirvana and discovered from some old musty attic years later; it was written by an artificial intelligence (AI) engine. To be more precise, it was written by a neural network trained on the entire body of Nirvana's work.


Google's AI software used to create 'new' Nirvana song 'Drowned in the Sun'

Daily Mail - Science & tech

Fans of Nirvana may do a double-take when they hear'Drowned in the Sun,' a new song created by artificial intelligence that simulates the songwriting of late grunge legend Kurt Cobain. Engineers fed Nirvana's back catalog to Google's AI program, Magenta, which analyzed it for recurring components and then developed an entirely new track. The voice on'Drowned in the Sun,' is 100 percent human, though--provided by Eric Hogan, lead singer of the Atlanta Nirvana cover band Nevermind. The song is just one release from The Lost Tapes of the 27 Club, a project developed by the nonprofit Over the Bridge, which spotlights mental health issues in the music industry. Other AI-generated'lost' tracks have taken their cue from Jim Morrison, Jimi Hendrix and Amy Winehouse, who, like Cobain, died at age 27.


AI software creates "new" Nirvana song "Drowned in the Sun"

#artificialintelligence

The recently launched Lost Tapes of the 27 Club project uses AI software to create songs in the style of musicians who died at the age of 27. One of the featured tracks is called "Drowned in the Sun", and it comes pretty close to replicating a Nirvana song written by Kurt Cobain himself. With opening guitars starting out restrained before reaching a crescendo on the chorus, the track is reminiscent of Nirvana's signature hit, "Come as You Are". Its chorus sounds like something Cobain might have written, too, with lyrics like, "I don't care/ I feel as one, drowned in the sun." As explained in a Rolling Stone feature, Google's AI program Magenta was used to analyze the pioneering grunge band's music and create the instrumental track.