AITopics

2502.00747

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

arXiv.org Artificial IntelligenceFeb-1-2025

Enhancing Memory and Imagination Consistency in Diffusion-based World Models via Linear-Time Sequence Modeling

Lee, Jia-Hua, Lin, Bor-Jiun, Sun, Wei-Fang, Lee, Chun-Yi

World models are crucial for enabling agents to simulate and plan within environments, yet existing approaches struggle with long-term dependencies and inconsistent predictions. We introduce EDELINE, a novel framework that integrates diffusion models with linear-time state space modelsto enhance memory retention and temporal consistency. EDELINE employs a recurrent embedding module based on Mamba SSMs for processing unbounded sequences, a unified architecture for joint reward and termination prediction, and dynamic loss harmonization to balance multi-task learning. Our results across multiple benchmarks demonstrate EDELINE's superiority and robustness over prior baselines in long-horizon tasks.

artificial intelligence, deep learning, machine learning, (11 more...)

2502.00466

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Consumer Health (0.71)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

arXiv.org Machine LearningFeb-1-2025

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

Chai, Jinhang, Chen, Elynn, Yang, Lin

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure and a sparse component capturing task-specific variations. This relaxes the common assumption of purely low-rank transition models, allowing for more realistic scenarios where tasks share core dynamics but maintain individual variations. We introduce UCB-TQL (Upper Confidence Bound Transfer Q-Learning), designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions. When applying UCB-TQL to a target task after training on a source task with sufficient trajectories, we achieve a regret bound of $\tilde{O}(\sqrt{eH^5N})$ that scales independently of the ambient dimension. Here, $N$ represents the number of trajectories in the target task, while $e$ quantifies the sparse differences between tasks. This result demonstrates substantial improvement over single task RL by effectively leveraging their structural similarities. Our theoretical analysis provides rigorous guarantees for how UCB-TQL simultaneously exploits shared dynamics while adapting to task-specific variations.

composite markov decision process, machine learning, reinforcement learning, (2 more...)

2502.00534

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Kao, Ching-Chia, Yu, Chia-Mu, Lu, Chun-Shien, Chen, Chu-Song

Safety Alignment Depth in Large Language Models: A Markov Chain Perspective

arXiv.org Artificial IntelligenceFeb-1-2025

Large Language Models (LLMs) are increasingly adopted in high-stakes scenarios, yet their safety mechanisms often remain fragile. Simple jailbreak prompts or even benign fine-tuning can bypass these protocols, underscoring the need to understand where and how they fail. Recent findings suggest that vulnerabilities emerge when alignment is confined to only the initial output tokens. Unfortunately, even with the introduction of deep safety alignment, determining the optimal safety depth remains an unresolved challenge. By leveraging the equivalence between autoregressive language models and Markov chains, this paper offers the first theoretical result on how to identify the ideal depth for safety alignment, and demonstrates how permutation-based data augmentation can tighten these bounds. Crucially, we reveal a fundamental interaction between alignment depth and ensemble width-indicating that broader ensembles can compensate for shallower alignments. These insights provide a theoretical foundation for designing more robust, scalable safety strategies that complement existing alignment approaches, opening new avenues for research into safer, more reliable LLMs.

large language model, machine learning, natural language, (14 more...)

2502.00669

Country: Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)

Mehta, Prashant, Meyn, Sean

Functional role of synchronization: A mean-field control perspective

arXiv.org Machine LearningFeb-1-2025

Our friend and mentor Peter Caines has, together with his colleagues, created new foundations for studying collective dynamics in complex systems. Of particular inspiration to us has been his pioneering work in mean-field games (MFGs) launched two decades ago [10, 24, 25], and the related field of mean-field control. Peter pointed the way to both formulate and solve the problem of collective dynamics arising in a large population of heterogeneous dynamical systems. In this paper we survey some elements of MFGs within the context of controlled coupled oscillators. We begin by introducing a model for a single oscillator: dθ(t) = (ω + u(t)) dt + σ dξ(t), mod 2π (1) where θ(t) [0, 2π) is the phase of the oscillator at time t, ω is the nominal frequency with units of radiansper-second, {ξ(t): t 0} is a standard Wiener process, and u(t) is a control signal whose interpretation depends on the context. Unless otherwise noted, the SDEs are interpreted in their Itô form.

artificial intelligence, machine learning, oscillator, (19 more...)

2502.0059

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Game Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Masinelli, Giulio, Rajani, Chang, Hoffmann, Patrik, Wasmer, Kilian, Atienza, David

Reinforcement Learning on Reconfigurable Hardware: Overcoming Material Variability in Laser Material Processing

arXiv.org Artificial IntelligenceJan-31-2025

Ensuring consistent processing quality is challenging in laser processes due to varying material properties and surface conditions. Although some approaches have shown promise in solving this problem via automation, they often rely on predetermined targets or are limited to simulated environments. To address these shortcomings, we propose a novel real-time reinforcement learning approach for laser process control, implemented on a Field Programmable Gate Array to achieve real-time execution. Our experimental results from laser welding tests on stainless steel samples with a range of surface roughnesses validated the method's ability to adapt autonomously, without relying on reward engineering or prior setup information. Specifically, the algorithm learned the correct power profile for each unique surface characteristic, demonstrating significant improvements over hand-engineered optimal constant power strategies -- up to 23% better performance on rougher surfaces and 7% on mixed surfaces. This approach represents a significant advancement in automating and optimizing laser processes, with potential applications across multiple industries.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2501.19102

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States (0.04)
North America > Canada (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.82)

Industry: Materials (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Lambrechts, Gaspard, Ernst, Damien, Mahajan, Aditya

A Theoretical Justification for Asymmetric Actor-Critic Algorithms

arXiv.org Machine LearningJan-31-2025

In reinforcement learning for partially observable environments, many successful algorithms were developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a theoretical justification for their potential benefits. We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting. The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

2501.19116

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Belgium > Wallonia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Rosset, Lorenzo, Netti, Roberto, Muntoni, Anna Paola, Weigt, Martin, Zamponi, Francesco

adabmDCA 2.0 -- a flexible but easy-to-use package for Direct Coupling Analysis

arXiv.org Artificial IntelligenceJan-30-2025

In this methods article, we provide a flexible but easy-to-use implementation of Direct Coupling Analysis (DCA) based on Boltzmann machine learning, together with a tutorial on how to use it. The package \texttt{adabmDCA 2.0} is available in different programming languages (C++, Julia, Python) usable on different architectures (single-core and multi-core CPU, GPU) using a common front-end interface. In addition to several learning protocols for dense and sparse generative DCA models, it allows to directly address common downstream tasks like residue-residue contact prediction, mutational-effect prediction, scoring of sequence libraries and generation of artificial sequences for sequence design. It is readily applicable to protein and RNA sequence data.

artificial intelligence, machine learning, sequence, (18 more...)

2501.18456

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Russo, Alessio, Metelli, Alberto Maria, Restelli, Marcello

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models

arXiv.org Machine LearningJan-30-2025

Reinforcement Learning (RL) (Sutton and Barto, We tackle average-reward infinite-horizon 2018) tackles the sequential decision-making problem POMDPs with an unknown transition model of an agent interacting with an unknown or partially but a known observation model, a setting known environment with the goal of maximizing the that has been previously addressed in two long-term sum of rewards. The RL agent should tradeoff limiting ways: (i) frequentist methods relying between exploring the environment to learn its on suboptimal stochastic policies having structure and exploiting the estimates to compute a a minimum probability of choosing each action, policy that maximizes the reward. This problem has and (ii) Bayesian approaches employing been successfully addressed in past works under the the optimal policy class but requiring MDP formulation (Bartlett and Tewari, 2009; Jaksch strong assumptions about the consistency et al., 2010; Zanette and Brunskill, 2019). MDPs assume of employed estimators. Our work removes full observability of the state space but this assumption these limitations by proving convenient estimation is often violated in many real-world scenarios guarantees for the transition model such as robotics or finance, where only a partial and introducing an optimistic algorithm that observation of the environment is available. In this leverages the optimal class of deterministic case, it is more appropriate to model the problem using belief-based policies. We introduce modifications Partially-Observable MDPs (Sondik, 1978).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2501.1879

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJan-30-2025

Deceptive Sequential Decision-Making via Regularized Policy Optimization

Kim, Yerin, Benvenuti, Alexander, Chen, Bo, Karabag, Mustafa, Kulkarni, Abhishek, Bastian, Nathaniel D., Topcu, Ufuk, Hale, Matthew

Autonomous systems are increasingly expected to operate in the presence of adversaries, though an adversary may infer sensitive information simply by observing a system, without even needing to interact with it. Therefore, in this work we present a deceptive decision-making framework that not only conceals sensitive information, but in fact actively misleads adversaries about it. We model autonomous systems as Markov decision processes, and we consider adversaries that attempt to infer their reward functions using inverse reinforcement learning. To counter such efforts, we present two regularization strategies for policy synthesis problems that actively deceive an adversary about a system's underlying rewards. The first form of deception is ``diversionary'', and it leads an adversary to draw any false conclusion about what the system's reward function is. The second form of deception is ``targeted'', and it leads an adversary to draw a specific false conclusion about what the system's reward function is. We then show how each form of deception can be implemented in policy optimization problems, and we analytically bound the loss in total accumulated reward that is induced by deception. Next, we evaluate these developments in a multi-agent sequential decision-making problem with one real agent and multiple decoys. We show that diversionary deception can cause the adversary to believe that the most important agent is the least important, while attaining a total accumulated reward that is $98.83\%$ of its optimal, non-deceptive value. Similarly, we show that targeted deception can make any decoy appear to be the most important agent, while still attaining a total accumulated reward that is $99.25\%$ of its optimal, non-deceptive value.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2501.18803

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)