AITopics

2509.01257

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceOct-24-2025

Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization

Kwon, Taeyoon, Choi, Dongwook, Kim, Hyojun, Kim, Sunghwan, Moon, Seungjun, Kwak, Beong-woo, Huang, Kuan-Hao, Yeo, Jinyoung

LLM-powered embodied agents have shown success on conventional object-rearrangement tasks, but providing personalized assistance that leverages user-specific knowledge from past interactions presents new challenges. We investigate these challenges through the lens of agents' memory utilization along two critical dimensions: object semantics (identifying objects based on personal meaning) and user patterns (recalling sequences from behavioral routines). To assess these capabilities, we construct MEMENTO, an end-to-end two-stage evaluation framework comprising single-memory and joint-memory tasks. Our experiments reveal that current agents can recall simple object semantics but struggle to apply sequential user patterns to planning. Through in-depth analysis, we identify two critical bottlenecks: information overload and coordination failures when handling multiple memories. Based on these findings, we explore memory architectural approaches to address these challenges. Given our observation that episodic memory provides both personalized knowledge and in-context learning benefits, we design a hierarchical knowledge graph-based user-profile memory module that separately manages personalized knowledge, achieving substantial improvements on both single and joint-memory tasks. Project website: https://connoriginal.github.io/MEMENTO

large language model, machine learning, natural language, (15 more...)

2505.16348

Country: North America > United States (0.45)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

arXiv.org Machine LearningOct-24-2025

On Multiple Robustness of Proximal Dynamic Treatment Regimes

Gao, Yuanshan, Bai, Yang, Cui, Yifan

Dynamic treatment regimes are sequential decision rules that adapt treatment according to individual time-varying characteristics and outcomes to achieve optimal effects, with applications in precision medicine, personalized recommendations, and dynamic marketing. Estimating optimal dynamic treatment regimes via sequential randomized trials might face costly and ethical hurdles, often necessitating the use of historical observational data. In this work, we utilize proximal causal inference framework for learning optimal dynamic treatment regimes when the unconfoundedness assumption fails. Our contributions are four-fold: (i) we propose three nonparametric identification methods for optimal dynamic treatment regimes; (ii) we establish the semiparametric efficiency bound for the value function of a given regime; (iii) we propose a (K+1)-robust method for learning optimal dynamic treatment regimes, where K is the number of stages; (iv) as a by-product for marginal structural models, we establish identification and estimation of counterfactual means under a static regime. Numerical experiments validate the efficiency and multiple robustness of our proposed methods.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

2510.20451

Country:

North America > United States (0.45)
Europe (0.27)

Genre:

Research Report > Experimental Study (0.87)
Research Report > Strength High (0.65)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

arXiv.org Machine LearningOct-23-2025

Benchmarking Probabilistic Time Series Forecasting Models on Neural Activity

Lu, Ziyu, Li, Anna J., Ladd, Alexander E., Matveev, Pascha, Deole, Aditya, Shea-Brown, Eric, Kutz, J. Nathan, Steinmetz, Nicholas A.

Neural activity forecasting is central to understanding neural systems and enabling closed-loop control. While deep learning has recently advanced the state-of-the-art in the time series forecasting literature, its application to neural activity forecasting remains limited. To bridge this gap, we systematically evaluated eight probabilistic deep learning models, including two foundation models, that have demonstrated strong performance on general forecasting benchmarks. We compared them against four classical statistical models and two baseline methods on spontaneous neural activity recorded from mouse cortex via widefield imaging. Across prediction horizons, several deep learning models consistently outperformed classical approaches, with the best model producing informative forecasts up to 1.5 seconds into the future. Our findings point toward future control applications and open new avenues for probing the intrinsic temporal structure of neural activity.

artificial intelligence, forecasting, machine learning, (17 more...)

arXiv.org Machine Learning

2510.18037

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > Washington > King County > Seattle (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Energy (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningOct-23-2025

On the hardness of RL with Lookahead

Pla, Corentin, Richard, Hugo, Abeille, Marc, Merlis, Nadav, Perchet, Vianney

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

machine learning, reinforcement learning, transition look-ahead, (18 more...)

arXiv.org Machine Learning

2510.19372

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
(2 more...)

Giwa, Oluwaseyi, Mohsin, Muhammad Ahmed, Adesola, Folarin Jubril, Jamshed, Muhammad Ali

QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels

IRELESS communication over fading channels remains one of the fundamental challenges in modern networks. In particular, Rayleigh fading channels, which model rich-scattering non-line-of-sight environments, cause rapid and unpredictable fluctuations in signal strength that can significantly degrade throughput and reliability. To mitigate these effects, link adaptation techniques such as adaptive modulation and coding (AMC) and power control have been extensively studied as key enablers of efficient spectrum use [1], [2]. Early works on link adaptation for Rayleigh fading channels demonstrated how explicit channel estimation and threshold-based switching could improve throughput and maintain robustness under fading conditions [3]-[6]. Despite their success, these classical approaches rely on accurate channel estimation, fixed rules, and often compromise between average throughput and outage probability in a suboptimal manner [4]-[6]. Furthermore, as networks evolve toward 6G with denser topologies and stringent reliability demands, such schemes struggle to scale or adapt to system-level complexities [7], [8]. Recent works have explored deep reinforcement learning (DRL) and meta reinforcement learning (RL) for link adaptation and resource allocation, showing promising adaptability but still facing high sample complexity and training instability [9]-[12]. In this letter, we propose quantum-preconditioned policy gradient (QPPG), a natural actor-critic method for link adap-Oluwaseyi Giwa is with the African Institute for Mathematical Sciences, South Africa (e-mail: {oluwaseyi}@aims.ac.za). Muhammad Ahmed Mohsin is with Stanford University, Stanford, California, 94305, United States (e-mail: {muahmed}@stanford.edu).

link adaptation, machine learning, reinforcement learning, (11 more...)

2506.15753

Country:

North America > United States > California > Santa Clara County > Stanford (0.24)
North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Castanyer, Roger Creus, Mohamed, Faisal, Castro, Pablo Samuel, Neary, Cyrus, Berseth, Glen

ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning

Reinforcement learning (RL) algorithms are highly sensitive to reward function specification, which remains a central challenge limiting their broad applicability. We present ARM-FM: Automated Reward Machines via Foundation Models, a framework for automated, compositional reward design in RL that leverages the high-level reasoning capabilities of foundation models (FMs). Reward machines (RMs) -- an automata-based formalism for reward specification -- are used as the mechanism for RL objective specification, and are automatically constructed via the use of FMs. The structured formalism of RMs yields effective task decompositions, while the use of FMs enables objective specifications in natural language. Concretely, we (i) use FMs to automatically generate RMs from natural language specifications; (ii) associate language embeddings with each RM automata-state to enable generalization across tasks; and (iii) provide empirical evidence of ARM-FM's effectiveness in a diverse suite of challenging environments, including evidence of zero-shot generalization.

large language model, machine learning, reinforcement learning, (17 more...)

2510.14176

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Test-time Prompt Intervention

Yang, Chenxu, Si, Qingyi, Dai, Mz, Yao, Dingyu, Zheng, Mingyu, Chen, Minghui, Lin, Zheng, Wang, Weiping

Test-time compute has led to remarkable success in the large language model (LLM) community, particularly for complex tasks, where longer chains of thought (CoTs) are generated to enhance reasoning capabilities. However, growing evidence reveals that such reasoning models often produce CoTs plagued by excessive redundancy, including unnecessary verification steps and repetitive reasoning shifts. The root cause lies in post-training of them that overly rely on outcome reward paradigms, as the data of process reward paradigms, which regulate intermediate reasoning steps, is difficult to construct at scale. To address this, we propose PI, a novel framework for Test-time Prompt Intervention. PI provides an interface to dynamically guide and regulate reasoning paths during inference through timely (When module) and proper (How module) interventions and post-intervention sampling (Which module). This allows human problem-solving expertise and cognitive science principles to be seamlessly integrated into LLMs' reasoning processes, enhancing controllability and interpretability. Extensive experiments across multiple models and datasets demonstrate that PI significantly shortens CoTs while reducing hallucination, yielding more concise and reliable reasoning.

large language model, machine learning, natural language, (19 more...)

2508.02511

Country: North America > United States > California (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Neural Variational Dropout Processes

Jeon, Insu, Park, Youngjin, Kim, Gunhee

Learning to infer the conditional posterior model is a key step for robust meta-learning. This paper presents a new Bayesian meta-learning approach called Neural Variational Dropout Processes (NVDPs). NVDPs model the conditional posterior distribution based on a task-specific dropout; a low-rank product of Bernoulli experts meta-model is utilized for a memory-efficient mapping of dropout rates from a few observed contexts. It allows for a quick reconfiguration of a globally learned and shared neural network for new tasks in multi-task few-shot learning. In addition, NVDPs utilize a novel prior conditioned on the whole task data to optimize the conditional \textit{dropout} posterior in the amortized variational inference. Surprisingly, this enables the robust approximation of task-specific dropout rates that can deal with a wide range of functional ambiguities and uncertainties. We compared the proposed method with other meta-learning approaches in the few-shot learning tasks such as 1D stochastic regression, image inpainting, and classification. The results show the excellent performance of NVDPs.

artificial intelligence, machine learning, posterior, (17 more...)

2510.19425

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations

Sayed, Abdelrahman Sayed

The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2510.19415

Country: Europe (0.93)

Genre: Research Report (0.50)

Industry:

Energy (1.00)
Information Technology > Security & Privacy (0.66)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)