Markov Models
ABS: Enforcing Constraint Satisfaction On Generated Sequences Via Automata-Guided Beam Search
Collura, Vincenzo, Tit, Karim, Bussi, Laura, Giunchiglia, Eleonora, Cordy, Maxime
Sequence generation and prediction form a cornerstone of modern machine learning, with applications spanning natural language processing, program synthesis, and time-series forecasting. These tasks are typically modeled in an autoregressive fashion, where each token is generated conditional on the preceding ones, and beam search is commonly used to balance exploration and fluency during decoding. While deep learning models and Large Language Models (LLMs) excel at capturing statistical patterns in this setting, they remain ill-equipped to guarantee compliance with formal constraints. In this paper, we introduce ABS: a general and model-agnostic inference-time algorithm that guarantees compliance with any constraint that can be compiled into a Deterministic Finite Automaton (DFA), without requiring retraining. ABS leverages the DFA to guide a constrained variant of beam search: at each decoding step, transitions leading to violations are masked, while remaining paths are dynamically re-ranked according to both the model's probabilities and the automaton's acceptance structure. We formally prove that the resulting sequences are guaranteed to satisfy the given constraints, and we empirically demonstrate that ABS also improves output quality. We validate our approach on three distinct tasks: constrained image-stream classification, controlled text generation, and text infilling. In all settings, ABS achieves perfect constraint satisfaction, while outperforming or matching state-of-the-art baselines on standard quality metrics and efficiency.
Learning CNF formulas from uniform random solutions in the local lemma regime
Feng, Weiming, Yang, Xiongxin, Yu, Yixiao, Zhang, Yiyao
We study the problem of learning a $n$-variables $k$-CNF formula $Φ$ from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with $k$-wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) $k$-CNFs with bounded clause intersection size under Lovász local lemma type conditions, from $O(\log n)$ samples; and (2) random $k$-CNFs near the satisfiability threshold, from $\widetilde{O}(n^{\exp(-\sqrt{k})})$ samples. These results significantly improve the previous $O(n^k)$ sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.
A new class of Markov random fields enabling lightweight sampling
Courbot, Jean-Baptiste, Gangloff, Hugo, Colicchio, Bruno
This work addresses the problem of efficient sampling of Markov random fields (MRF). The sampling of Potts or Ising MRF is most often based on Gibbs sampling, and is thus computationally expensive. We consider in this work how to circumvent this bottleneck through a link with Gaussian Markov Random fields. The latter can be sampled in several cost-effective ways, and we introduce a mapping from real-valued GMRF to discrete-valued MRF. The resulting new class of MRF benefits from a few theoretical properties that validate the new model. Numerical results show the drastic performance gain in terms of computational efficiency, as we sample at least 35x faster than Gibbs sampling using at least 37x less energy, all the while exhibiting empirical properties close to classical MRFs.
Diluting Restricted Boltzmann Machines
Recent advances in artificial intelligence have relied heavily on increasingly large neural networks, raising concerns about their computational and environmental costs. This paper investigates whether simpler, sparser networks can maintain strong performance by studying Restricted Boltzmann Machines (RBMs) under extreme pruning conditions. Inspired by the Lottery Ticket Hypothesis, we demonstrate that RBMs can achieve high-quality generative performance even when up to 80% of the connections are pruned before training, confirming that they contain viable sub-networks. However, our experiments reveal crucial limitations: trained networks cannot fully recover lost performance through retraining once additional pruning is applied. We identify a sharp transition above which the generative quality degrades abruptly when pruning disrupts a minimal core of essential connections. Moreover, re-trained networks remain constrained by the parameters originally learned performing worse than networks trained from scratch at equivalent sparsity levels. These results suggest that for sparse networks to work effectively, pruning should be implemented early in training rather than attempted afterwards. Our findings provide practical insights for the development of efficient neural architectures and highlight the persistent influence of initial conditions on network capabilities.
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
Bu, Qingwen, Yang, Yanting, Cai, Jisong, Gao, Shenyuan, Ren, Guanghui, Yao, Maoqing, Luo, Ping, Li, Hongyang
A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA's potential to facilitate scalable and efficient robot policy learning.
MARFT: Multi-Agent Reinforcement Fine-Tuning
Liao, Junwei, Wen, Muning, Wang, Jun, Zhang, Weinan
LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks, from generating high-quality presentation slides to even conducting sophisticated scientific research. Meanwhile, RL has been widely recognized for its effectiveness in enhancing agent intelligence, but limited research has investigated the fine-tuning of LaMAS using foundational RL techniques. Moreover, the direct application of MARL methods to LaMAS introduces significant challenges, stemming from the unique characteristics and mechanisms inherent to LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes a novel paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a brand-new MG called Flex-MG, which aligns with the LaMAS optimization in real-world applications and a universal algorithmic framework tailored specifically for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies. We review the evolution from RL to RFT, setting the stage for a parallel analysis in the multi-agent domain. In the context of LaMAS, we elucidate critical differences between MARL and MARFT. These differences motivate a transition toward a LaMAS-oriented formulation of RFT. Central to this work is a robust and scalable MARFT framework. We detail the core algorithm and provide a complete, open-source implementation to facilitate adoption and further research. The latter sections of the paper explore real-world application perspectives and opening challenges in MARFT. By bridging theoretical underpinnings with practical methodologies, this work serves as a roadmap for researchers seeking to advance MARFT toward resilient and adaptive solutions in agentic systems. Our implementation of the proposed framework is publicly available at: https://github.com/jwliao-ai/MARFT.
Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models
Ramachandran, Sai Niranjan, Lal, Manish Krishan, Sra, Suvrit
We analyse how the sampling dynamics of distributions evolve in score-based diffusion models using cross-fluctuations, a centered-moment statistic from statistical physics. Specifically, we show that starting from an unbiased isotropic normal distribution, samples undergo sharp, discrete transitions, eventually forming distinct events of a desired distribution while progressively revealing finer structure. As this process is reversible, these transitions also occur in reverse, where intermediate states progressively merge, tracing a path back to the initial distribution. We demonstrate that these transitions can be detected as discontinuities in $n^{\text{th}}$-order cross-fluctuations. For variance-preserving SDEs, we derive a closed-form for these cross-fluctuations that is efficiently computable for the reverse trajectory. We find that detecting these transitions directly boosts sampling efficiency, accelerates class-conditional and rare-class generation, and improves two zero-shot tasks--image classification and style transfer--without expensive grid search or retraining. We also show that this viewpoint unifies classical coupling and mixing from finite Markov chains with continuous dynamics while extending to stochastic SDEs and non Markovian samplers. Our framework therefore bridges discrete Markov chain theory, phase analysis, and modern generative modeling.
AERMANI-VLM: Structured Prompting and Reasoning for Aerial Manipulation with Vision Language Models
Mishra, Sarthak, Yadav, Rishabh Dev, Das, Avirup, Gupta, Saksham, Pan, Wei, Roy, Spandan
This reasoning-action loop continues until task completion, enabling the VLM to focus on semantic reasoning while delegating precise execution to robust controllers. The framework is evaluated in simulation and real-world experiments using a pretrained VLM, and comprehensive comparison and ablation studies are carried out to verify its performance. CLIPSeg [12] is used for prompt-based segmentation, maintaining a unified prompting pipeline from perception to reasoning. A. Additional Related W orks Aerial manipulation has progressed from vision-guided approaches relying on onboard cameras and artificial visual cues [13], to fully markerless grasping systems using onboard perception [14], and more recently end-effector-centric frameworks for versatile manipulation [15], yet all remain focused on execution rather than language-level reasoning. In parallel, VLAs [2]-[5] combine LLMbased planning [16], [17] with perceptual grounding from models such as CLIP [18], CLIPort [19], and LLaV A [20], but their end-to-end policies are data-intensive and prone to unsafe behaviors from ambiguous outputs, or adversarial prompts, motivating hybrid approaches where reasoning is decoupled from execution via modular skill primitives [21], [22]. For multirotors specifically, foundation model research has focused on mission planning [23], spatial reasoning [24], and direct control [25] which advances locomotion but does not extend to aerial manipulation, and it requires exploration coupled with grasping and placement [26]. In summary, control-focused aerial manipulation, reasoning-focused VLAs, and navigation-focused UA V -VLN each address parts of the problem, but none unify perception, reasoning, and execution for aerial manipulation. Together, these limitations motivate AERMANI-VLM, which unifies open-vocabulary perception, structured reasoning, and safe skill execution for aerial manipulation.
Predictive Auxiliary Learning for Belief-based Multi-Agent Systems
Huang, Qinwei, Wang, Stefan, Khan, Simon, Katz, Garrett, Qiu, Qinru
The performance of multi-agent reinforcement learning (MARL) in partially observable environments depends on effectively aggregating information from observations, communications, and reward signals. While most existing multi-agent systems primarily rely on rewards as the only feedback for policy training, our research shows that introducing auxiliary predictive tasks can significantly enhance learning efficiency and stability. We propose Belief-based Predictive Auxiliary Learning (BEPAL), a framework that incorporates auxiliary training objectives to support policy optimization. BEPAL follows the centralized training with decentralized execution paradigm. Each agent learns a belief model that predicts unobservable state information, such as other agents' rewards or motion directions, alongside its policy model. By enriching hidden state representations with information that does not directly contribute to immediate reward maximization, this auxiliary learning process stabilizes MARL training and improves overall performance. We evaluate BEPAL in the predator-prey environment and Google Research Football, where it achieves an average improvement of about 16 percent in performance metrics and demonstrates more stable convergence compared to baseline methods.
Experience-Driven Exploration for Efficient API-Free AI Agents
Tang, Chenwei, Xing, Jingyu, Liu, Xinyu, Wang, Zizhou, Du, Jiawei, Zhen, Liangli, Lv, Jiancheng
Most existing software lacks accessible Application Programming Interfaces (APIs), requiring agents to operate solely through pixel-based Graphical User Interfaces (GUIs). In this API-free setting, large language model (LLM)-based agents face severe efficiency bottlenecks: limited to local visual experiences, they make myopic decisions and rely on inefficient trial-and-error, hindering both skill acquisition and long-term planning. To address these challenges, we propose KG-Agent, an experience-driven learning framework that structures an agent's raw pixel-level interactions into a persistent State-Action Knowledge Graph (SA-KG). KG-Agent overcomes inefficient exploration by linking functionally similar but visually distinct GUI states, forming a rich neighborhood of experience that enables the agent to generalize from a diverse set of historical strategies. To support long-horizon reasoning, we design a hybrid intrinsic reward mechanism based on the graph topology, combining a state value reward for exploiting known high-value pathways with a novelty reward that encourages targeted exploration. This approach decouples strategic planning from pure discovery, allowing the agent to effectively value setup actions with delayed gratification. We evaluate KG-Agent in two complex, open-ended GUI-based decision-making environments (Civilization V and Slay the Spire), demonstrating significant improvements in exploration efficiency and strategic depth over the state-of-the-art methods.