AITopics

Neural Information Processing SystemsOct-3-2025, 00:03:50 GMT

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

However, animals often appear to behave suboptimally.

agent, internal model, model parameter, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(3 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Ryabchenko, Alexander, Mou, Wenlong

Reinforcement Learning with Action-Triggered Observations

arXiv.org Machine LearningOct-3-2025

We study reinforcement learning problems where state observations are stochastically triggered by actions, a constraint common in many real-world applications. This framework is formulated as Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), where each action has a specified probability of triggering a state observation. We derive tailored Bellman optimality equations for this framework and introduce the action-sequence learning paradigm in which agents commit to executing a sequence of actions until the next observation arrives. Under the linear MDP assumption, value-functions are shown to admit linear representations in an induced action-sequence feature map. Leveraging this structure, we propose off-policy estimators with statistical error guarantees for such feature maps and introduce ST-LSVI-UCB, a variant of LSVI-UCB adapted for action-triggered settings. ST-LSVI-UCB achieves regret $\widetilde O(\sqrt{Kd^3(1-γ)^{-3}})$, where $K$ is the number of episodes, $d$ the feature dimension, and $γ$ the discount factor (per-step episode non-termination probability). Crucially, this work establishes the theoretical foundation for learning with sporadic, action-triggered observations while demonstrating that efficient learning remains feasible under such observation constraints.

probability, proof, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2510.02149

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(2 more...)

Genre:

Research Report (0.81)
Workflow (0.55)

Industry: Health & Medicine (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Robust Blind Channel Estimation for Bursty Impulsive Noise with a Constrained EM Approach

Chen, Chin-Hung, Nikoloska, Ivana, van Houtum, Wim, Wu, Yan, Karanov, Boris, Alvarado, Alex

Impulsive noise (IN) commonly generated by power devices can severely degrade the performance of high sensitivity wireless receivers. Accurate channel state information (CSI) knowledge is essential for designing optimal maximum a posteriori detectors. This paper examines blind channel estimation methods based on the expectation-maximization (EM) algorithm tailored for scenarios impacted by bursty IN, which can be described by the Markov-Middleton model. We propose a constrained EM algorithm that exploits the trellis structure of the IN model and the transmitted binary phase shift keying (BPSK) symbols. By enforcing shared variance among specific trellis states and symmetry in the transition matrix, the proposed constrained EM algorithm adapted for the bursty IN channel has an almost two times faster convergence rate and better estimation performance than the standard EM approach. We comprehensively evaluate the robustness of both standard and constrained EM estimators under different types of CSI uncertainties. The results indicate that the final estimations of both EM estimators are robust enough to mismatch Markov-Middleton model parameters. However, as the level of CSI uncertainty increases, the convergence rate decreases.

algorithm, artificial intelligence, machine learning, (14 more...)

doi: 10.1109/VTC2025-Spring65109.2025.11174888.

2504.03685

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Wan, Yanming, Wu, Jiaxing, Abdulhai, Marwa, Shani, Lior, Jaques, Natasha

Effective conversational agents like large language models (LLMs) must personalize their interactions to adapt to user preferences, personalities, and attributes across diverse domains like education and healthcare. Current methods like Reinforcement Learning from Human Feedback (RLHF), often prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized dialogues. Existing personalization approaches typically rely on extensive user history, limiting their effectiveness for new or context-limited users. To address these limitations, we propose leveraging a user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF. This novel reward mechanism encourages the LLM agent to actively infer user traits by optimizing conversations to improve its user model's accuracy. Consequently, the agent delivers more personalized interactions by learning more about the user. We demonstrate our method's effectiveness in two distinct domains: significantly improving personalization performance in a conversational recommendation task, and personalizing conversations for different learning styles in an educational setting. We show improved generalization capabilities compared to traditional multi-turn RLHF, all while maintaining conversation quality. Our method offers a promising solution for creating more personalized, adaptive, and engaging conversational agents.

customer, large language model, machine learning, (19 more...)

2504.03206

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Education > Educational Setting (0.87)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

The Unreasonable Effectiveness of Scaling Agents for Computer Use

Gonzalez-Pumariega, Gonzalo, Tu, Vincent, Lee, Chih-Lun, Yang, Jiachen, Li, Ang, Wang, Xin Eric

Computer-use agents (CUAs) hold promise for automating everyday digital tasks, but their unreliability and high variance hinder their application to long-horizon, complex tasks. We introduce Behavior Best-of-N (bBoN), a method that scales over agents by generating multiple rollouts and selecting among them using behavior narratives that describe the agents' rollouts. It enables both wide exploration and principled trajectory selection, substantially improving robustness and success rates. On OSWorld, our bBoN scaling method establishes a new state of the art (SoTA) at 69.9%, significantly outperforming prior methods and approaching human-level performance at 72%, with comprehensive ablations validating key design choices. We further demonstrate strong generalization results to different operating systems on WindowsAgentArena and AndroidWorld. Crucially, our results highlight the unreasonable effectiveness of scaling CUAs, when you do it right: effective scaling requires structured trajectory understanding and selection, and bBoN provides a practical framework to achieve this.

large language model, machine learning, natural language, (20 more...)

2510.0225

Country:

Asia (0.46)
North America (0.28)

Genre:

Workflow (0.93)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Acquah, Aidan, Chan, Shing, Doherty, Aiden

ActiNet: Activity intensity classification of wrist-worn accelerometers using self-supervised deep learning

The use of reliable and accurate human activity recognition (HAR) models on passively collected wrist - accelerometer data is essential in large - scale epidemiological studies that investigate the association between physical activity and health outcomes . While the use of self - supervised learning has generated considerable e xcitement in improving HAR, it remains unknown to what extent th ese models, coupled with hidden Markov models (HMMs), would make a tangible improvement to classification performance and the effect this may have on the predicted daily activity intensity compositions . Us ing 151 CAPTURE - 24 participants' data, we trained the ActiNet model, a self - supervised, 18 - layer, modified ResNet - V2 model, followed by hidden Markov model (HMM) smoothing to classify labels of activity intensity . The performance of this model, evaluated using 5 - fold stratified group cross - validation, was then compared to a baseline random forest (RF) + HMM, established in existing literature . Differences in performance and classification outputs were compared with different subgroups of age and sex within the Capture - 24 population. The ActiNet model was able to distinguish labels of activity intensity with a mean macro F1 score of 0.82 and a mean Cohen's kappa score of 0.86 . This exceeded the performance of the RF + HMM, trained and validated on the same dataset, with mean scores of 0.77 and 0.81, respectively . These findings were consistent across subgroups of age and sex. These findings encourage the use of ActiNet for the extraction of activity intensity labels from wrist - accelerometer data in future epidemiological studies.

activity intensity, artificial intelligence, machine learning, (15 more...)

2510.01712

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Consumer Health (0.70)
Health & Medicine > Epidemiology (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Fang, Djengo Cyun-Jyun, Ke, Tsung-Wei

Information Seeking for Robust Decision Making under Partial Observability

Explicit information seeking is essential to human problem-solving in practical environments characterized by incomplete information and noisy dynamics. When the true environmental state is not directly observable, humans seek information to update their internal dynamics and inform future decision-making. Although existing Large Language Model (LLM) planning agents have addressed observational uncertainty, they often overlook discrepancies between their internal dynamics and the actual environment. We introduce Information Seeking Decision Planner (InfoSeeker), an LLM decision-making framework that integrates task-oriented planning with information seeking to align internal dynamics and make optimal decisions under uncertainty in both agent observations and environmental dynamics. InfoSeeker prompts an LLM to actively gather information by planning actions to validate its understanding, detect environmental changes, or test hypotheses before generating or revising task-oriented plans. To evaluate InfoSeeker, we introduce a novel benchmark suite featuring partially observable environments with incomplete observations and uncertain dynamics. Experiments demonstrate that InfoSeeker achieves a 74% absolute performance gain over prior methods without sacrificing sample efficiency. Moreover, InfoSeeker generalizes across LLMs and outperforms baselines on established benchmarks such as robotic manipulation and web navigation. These findings underscore the importance of tightly integrating planning and information seeking for robust behavior in partially observable environments. The project page is available at https://infoseekerllm.github.io

information, large language model, machine learning, (16 more...)

2510.01531

Genre:

Workflow (0.94)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Sreedharan, Sarath, Sikes, Kelsey, Blanchard, Nathaniel, Mason, Lisa, Krishnaswamy, Nikhil, Zarestky, Jill

On the Role of Domain Experts in Creating Effective Tutoring Systems

The role that highly curated knowledge, provided by domain experts, could play in creating effective tutoring systems is often overlooked within the AI for education community. In this paper, we highlight this topic by discussing two ways such highly curated expert knowledge could help in creating novel educational systems. First, we will look at how one could use explainable AI (XAI) techniques to automatically create lessons. Most existing XAI methods are primarily aimed at debugging AI systems. However, we will discuss how one could use expert specified rules about solving specific problems along with novel XAI techniques to automatically generate lessons that could be provided to learners. Secondly, we will see how an expert specified curriculum for learning a target concept can help develop adaptive tutoring systems, that can not only provide a better learning experience, but could also allow us to use more efficient algorithms to create these systems. Finally, we will highlight the importance of such methods using a case study of creating a tutoring system for pollinator identification, where such knowledge could easily be elicited from experts.

learner, machine learning, natural language, (18 more...)

doi: 10.1007/978-3-031-99261-2_5

2510.01432

Country: North America > United States (0.70)

Genre:

Research Report (1.00)
Instructional Material (0.66)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)

Adaptive Federated Learning Defences via Trust-Aware Deep Q-Networks

Palit, Vedant

Federated learning is vulnerable to poisoning and backdoor attacks under partial observability. We formulate defence as a partially observable sequential decision problem and introduce a trust-aware Deep Q-Network that integrates multi-signal evidence into client trust updates while optimizing a long-horizon robustness--accuracy objective. On CIFAR-10, we (i) establish a baseline showing steadily improving accuracy, (ii) show through a Dirichlet sweep that increased client overlap consistently improves accuracy and reduces ASR with stable detection, and (iii) demonstrate in a signal-budget study that accuracy remains steady while ASR increases and ROC-AUC declines as observability is reduced, which highlights that sequential belief updates mitigate weaker signals. Finally, a comparison with random, linear-Q, and policy gradient controllers confirms that DQN achieves the best robustness--accuracy trade-off.

accuracy, machine learning, reinforcement learning, (18 more...)

2510.01261

Genre: Research Report (0.65)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)