dfas
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Greece (0.04)
- Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning
Yalcinkaya, Beyazit, Vazquez-Chanlatte, Marcell, Shah, Ameesh, Krasowski, Hanna, Seshia, Sanjit A.
We study the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks enables the decomposition of complex tasks into simpler sub-tasks that can be assigned to agents. However, existing approaches remain sample-inefficient and are limited to the single-task case. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify the main challenges to ACC-MARL's feasibility in practice, propose solutions, and prove the correctness of our approach. We further show that the value functions of learned policies can be used to assign tasks optimally at test time. Experiments show emergent task-aware, multi-step coordination among agents, e.g., pressing a button to unlock a door, holding the door, and short-circuiting tasks.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)
Hardness of Learning Regular Languages in the Next Symbol Prediction Setting
Bhattamishra, Satwik, Blunsom, Phil, Kanade, Varun
We study the learnability of languages in the Next Symbol Prediction (NSP) setting, where a learner receives only positive examples from a language together with, for every prefix, (i) whether the prefix itself is in the language and (ii) which next symbols can lead to an accepting string. This setting has been used in prior works to empirically analyze neural sequence models, and additionally, we observe that efficient algorithms for the NSP setting can be used to learn the (truncated) support of language models. We formalize the setting so as to make it amenable to PAC-learning analysis. While the setting provides a much richer set of labels than the conventional classification setting, we show that learning concept classes such as DFAs and Boolean formulas remains computationally hard. The proof is via a construction that makes almost all additional labels uninformative, yielding a reduction from the conventional learning problem to learning with NSP labels. Under cryptographic assumptions, the reduction implies that the problem of learning DFAs is computationally hard in the NSP setting.
- North America > United States (0.57)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Greece (0.04)
- Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Provably Correct Automata Embeddings for Optimal Automata-Conditioned Reinforcement Learning
Yalcinkaya, Beyazit, Lauffer, Niklas, Vazquez-Chanlatte, Marcell, Seshia, Sanjit A.
Automata-conditioned reinforcement learning (RL) has given promising results for learning multi-task policies capable of performing temporally extended objectives given at runtime, done by pretraining and freezing automata embeddings prior to training the downstream policy. However, no theoretical guarantees were given. This work provides a theoretical framework for the automata-conditioned RL problem and shows that it is probably approximately correct learnable. We then present a technique for learning provably correct automata embeddings, guaranteeing optimal multi-task policy learning. Our experimental evaluation confirms these theoretical results.
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs
Gupta, Kavi, Sanders, Kate, Solar-Lezama, Armando
Can LLMs pick up language structure from examples? Evidence in prior work seems to indicate yes, as pretrained models repeatedly demonstrate the ability to adapt to new language structures and vocabularies. However, this line of research typically considers languages that are present within common pretraining datasets, or otherwise share notable similarities with these seen languages. In contrast, in this work we attempt to measure models' language understanding capacity while circumventing the risk of dataset recall. We parameterize large families of language tasks recognized by deterministic finite automata (DFAs), and can thus sample novel language reasoning problems to fairly evaulate LLMs regardless of training data. We find that, even in the strikingly simple setting of 3-state DFAs, LLMs underperform unparameterized ngram models on both language recognition and synthesis tasks. These results suggest that LLMs struggle to match the ability of basic language models in recognizing and reasoning over languages that are sufficiently distinct from the ones they see at training time, underscoring the distinction between learning individual languages and possessing a general theory of language.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Maryland > Baltimore (0.04)
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
Yalcinkaya, Beyazit, Lauffer, Niklas, Vazquez-Chanlatte, Marcell, Seshia, Sanjit A.
Goal-conditioned reinforcement learning is a powerful way to control an AI agent's behavior at runtime. That said, popular goal representations, e.g., target states or natural language, are either limited to Markovian tasks or rely on ambiguous task semantics. We propose representing temporal goals using compositions of deterministic finite automata (cDFAs) and use cDFAs to guide RL agents. cDFAs balance the need for formal temporal semantics with ease of interpretation: if one can understand a flow chart, one can understand a cDFA. On the other hand, cDFAs form a countably infinite concept class with Boolean semantics, and subtle changes to the automaton can result in very different tasks, making them difficult to condition agent behavior on. To address this, we observe that all paths through a DFA correspond to a series of reach-avoid tasks and propose pre-training graph neural network embeddings on "reach-avoid derived" DFAs. Through empirical evaluation, we demonstrate that the proposed pre-training method enables zero-shot generalization to various cDFA task classes and accelerated policy specialization without the myopic suboptimality of hierarchical methods.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Greece (0.04)
- Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
- Government > Regional Government > North America Government > United States Government (0.39)
- Government > Military (0.39)
Learning to Coordinate without Communication under Incomplete Information
Chen, Shenghui, Zhu, Shufang, De Giacomo, Giuseppe, Topcu, Ufuk
Achieving seamless coordination in cooperative games is a crucial challenge in artificial intelligence, particularly when players operate under incomplete information. A common strategy to mitigate this information asymmetry involves leveraging explicit communication. However, direct communication is not always feasible due to factors such as transmission loss. We explore how effective coordination can be achieved without verbal communication, relying solely on observing each other's actions. We demonstrate how an autonomous agent can learn to cooperate by interpreting its partner's actions, which are used to hint at its intents. Our approach involves developing an agent strategy by constructing deterministic finite automata for each possible action and integrating them into a non-Markovian finite-state transducer. This transducer represents a non-deterministic strategy for the agent that suggests actions to assist its partner during gameplay. Experimental results in a testbed called Gnomes at Night show that the learned no-communication coordination strategy achieves significantly higher success rates and requires fewer steps to complete the game compared to uncoordinated scenarios, performing almost as well as an oracle baseline with direct communication.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
DeepDFA: Automata Learning through Neural Probabilistic Relaxations
Umili, Elena, Capobianco, Roberto
In this work, we introduce DeepDFA, a novel approach to identifying Deterministic Finite Automata (DFAs) from traces, harnessing a differentiable yet discrete model. Inspired by both the probabilistic relaxation of DFAs and Recurrent Neural Networks (RNNs), our model offers interpretability post-training, alongside reduced complexity and enhanced training efficiency compared to traditional RNNs. Moreover, by leveraging gradient-based optimization, our method surpasses combinatorial approaches in both scalability and noise resilience. Validation experiments conducted on target regular languages of varying size and complexity demonstrate that our approach is accurate, fast, and robust to noise in both the input symbols and the output labels of training data, integrating the strengths of both logical grammar induction and deep learning.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Massachusetts (0.04)
- (6 more...)
DFAMiner: Mining minimal separating DFAs from labelled samples
Dell'Erba, Daniele, Li, Yong, Schewe, Sven
We propose DFAMiner, a passive learning tool for learning minimal separating deterministic finite automata (DFA) from a set of labelled samples. Separating automata are an interesting class of automata that occurs generally in regular model checking and has raised interest in foundational questions of parity game solving. We first propose a simple and linear-time algorithm that incrementally constructs a three-valued DFA (3DFA) from a set of labelled samples given in the usual lexicographical order. This 3DFA has accepting and rejecting states as well as don't-care states, so that it can exactly recognise the labelled examples. We then apply our tool to mining a minimal separating DFA for the labelled samples by minimising the constructed automata via a reduction to solving SAT problems. Empirical evaluation shows that our tool outperforms current state-of-the-art tools significantly on standard benchmarks for learning minimal separating DFAs from samples. Progress in the efficient construction of separating DFAs can also lead to finding the lower bound of parity game solving, where we show that DFAMiner can create optimal separating automata for simple languages with up to 7 colours. Future improvements might offer inroads to better data structures.
- North America > United States > Iowa > Story County > Ames (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (13 more...)
- Research Report (0.50)
- Workflow (0.46)
- Government > Regional Government > North America Government > United States Government (0.93)
- Government > Military (0.83)