AITopics

2406.00054

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

arXiv.org Artificial IntelligenceMay-29-2024

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Luo, Yu, Ji, Tianying, Sun, Fuchun, Zhang, Jianwei, Xu, Huazhe, Zhan, Xianyuan

Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. In light of this, we introduce a surrogate policy learning objective by considering the transition occupancy discrepancies and then cast it into a tractable min-max optimization problem through dual reformulation. Our method, dubbed Occupancy-Matching Policy Optimization (OMPO), features a specialized actor-critic structure equipped with a distribution discriminator and a small-size local buffer. We conduct extensive experiments based on the OpenAI Gym, Meta-World, and Panda Robots environments, encompassing policy shifts under stationary and nonstationary dynamics, as well as domain adaption. The results demonstrate that OMPO outperforms the specialized baselines from different categories in all settings. We also find that OMPO exhibits particularly strong performance when combined with domain randomization, highlighting its potential in RL-based robotics applications

dynamic shift, objective, ompo, (13 more...)

2405.1908

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMay-29-2024

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

Li, Wenbing, Zhou, Hang, Yu, Junqing, Song, Zikai, Yang, Wei

The essence of multi-modal fusion lies in exploiting the complementary information inherent in diverse modalities. However, prevalent fusion methods rely on traditional neural architectures and are inadequately equipped to capture the dynamics of interactions across modalities, particularly in presence of complex intra- and inter-modality correlations. Recent advancements in State Space Models (SSMs), notably exemplified by the Mamba model, have emerged as promising contenders. Particularly, its state evolving process implies stronger modality fusion paradigm, making multi-modal fusion on SSMs an appealing direction. However, fusing multiple modalities is challenging for SSMs due to its hardware-aware parallelism designs. To this end, this paper proposes the Coupled SSM model, for coupling state chains of multiple modalities while maintaining independence of intra-modality state processes. Specifically, in our coupled scheme, we devise an inter-modal hidden states transition scheme, in which the current state is dependent on the states of its own chain and that of the neighbouring chains at the previous time-step. To fully comply with the hardware-aware parallelism, we devise an expedite coupled state transition scheme and derive its corresponding global convolution kernel for parallelism. Extensive experiments on CMU-MOSEI, CH-SIMS, CH-SIMSV2 through multi-domain input verify the effectiveness of our model compared to current state-of-the-art methods, improved F1-Score by 0.4\%, 0.9\%, and 2.3\% on the three datasets respectively, 49\% faster inference and 83.7\% GPU memory save. The results demonstrate that Coupled Mamba model is capable of enhanced multi-modal fusion.

fusion, mamba, modality, (15 more...)

2405.18014

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Zheng, Zhong, Zhang, Haochen, Xue, Lingzhou

Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost

arXiv.org Machine LearningMay-29-2024

Federated reinforcement learning (FRL) is a distributed learning framework that combines the principles of reinforcement learning (RL) [1] and federated learning (FL) [2]. Focusing on sequential decision-making, FRL aims to learn an optimal policy through parallel explorations by multiple agents under the coordination of a central server. Often modeled as a Markov decision process (MDP), multiple agents independently interact with an initially unknown environment and collaboratively train their decision-making models with limited information exchange between the agents. This approach accelerates the learning process with low communication costs. Some model-based algorithms (e.g., [3]) and policy-based algorithms (e.g., [4]) have shown speedup with respect to the number of agents in terms of learning regret or convergence rate. Recent progress has been made in FRL algorithms based on model-free value-based approaches, which directly learn the value functions and the optimal policy without estimating the underlying model (e.g., [5]). However, most existing model-free federated algorithms do not actively update the exploration policies for local agents and fail to provide low regret. A comprehensive literature review is provided in Appendix A.

algorithm, communication cost, inequality, (12 more...)

arXiv.org Machine Learning

2405.18795

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

Bhat, Shreyas, Lyons, Joseph B., Shi, Cong, Yang, X. Jessie

With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic goal alignment within a task) between the robot and the human is gaining increasing research attention. Prior literature on value alignment makes an inherent assumption that aligning the values of the robot with that of the human benefits the team. This assumption, however, has not been empirically verified. Moreover, prior literature does not account for human's trust in the robot when analyzing human-robot value alignment. Thus, a research gap needs to be bridged by answering two questions: How does alignment of values affect trust? Is it always beneficial to align the robot's values with that of the human? We present a simulation study and a human-subject study to answer these questions. Results from the simulation study show that alignment of values is important for trust when the overall risk level of the task is high. We also present an adaptive strategy for the robot that uses Inverse Reinforcement Learning (IRL) to match the values of the robot with those of the human during interaction. Our simulations suggest that such an adaptive strategy is able to maintain trust across the full spectrum of human values. We also present results from an empirical study that validate these findings from simulation. Results indicate that real-time personalized value alignment is beneficial to trust and perceived performance by the human when the robot does not have a good prior on the human's values.

doi 10, human-robot interaction, robot, (10 more...)

2405.18324

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games (0.46)
Health & Medicine (0.46)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.93)
(3 more...)

Faria, Gonçalo R. A., Agrawal, Sweta, Farinhas, António, Rei, Ricardo, de Souza, José G. C., Martins, André F. T.

QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation

An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware and minimum Bayes risk decoding). However, relying on a single translation with high estimated quality increases the chances of "gaming the metric''. In this paper, we address the problem of sampling a set of high-quality and diverse translations. We provide a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution. Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm, a simple Markov chain Monte Carlo approach. The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (English$\leftrightarrow${German, Russian}) with two strong decoder-only LLMs (Alma-7b, Tower-7b).

computational linguistic, hypothesis, translation, (13 more...)

2406.00049

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Singapore (0.05)
(13 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

World Models for General Surgical Grasping

Lin, Hongbin, Li, Bin, Wong, Chun Wai, Rojas, Juan, Chu, Xiangyu, Au, Kwok Wai Samuel

Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor policy for surgical grasping, enhancing both generality and robustness. In particular, a novel method is proposed to estimate the values and uncertainties of depth pixels for a rigid-link object's inaccurate region based on the empirical prior of the object's size; both depth and mask images of task objects are encoded to a single compact 3-channel image (size: 64x64x3) by dynamically zooming in the mask regions, minimizing the information loss. The learned controller's effectiveness is extensively evaluated in simulation and in a real robot. Our learned visuomotor policy handles: i) unseen objects, including 5 types of target grasping objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control. Note that we are the first work to achieve a unified surgical control system that grasps diverse surgical objects using different robot grippers on real robots in complex surgery scenes (average success rate: 69%). Our system also demonstrates significant robustness across 6 conditions including background variation, target disturbance, camera pose variation, kinematic control error, image noise, and re-grasping after the gripped target object drops from the gripper. Videos and codes can be found on our project page: https://linhongbin.github.io/gas/.

controller, disturbance, gripper, (15 more...)

2405.1794

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Technology (0.67)
Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

Li, Xinran, Liu, Zifan, Chen, Shibo, Zhang, Jun

In multi-agent reinforcement learning (MARL), effective exploration is critical, especially in sparse reward environments. Although introducing global intrinsic rewards can foster exploration in such settings, it often complicates credit assignment among agents. To address this difficulty, we propose Individual Contributions as intrinsic Exploration Scaffolds (ICES), a novel approach to motivate exploration by assessing each agent's contribution from a global view. In particular, ICES constructs exploration scaffolds with Bayesian surprise, leveraging global transition information during centralized training. These scaffolds, used only in training, help to guide individual agents towards actions that significantly impact the global latent state transitions. Additionally, ICES separates exploration policies from exploitation policies, enabling the former to utilize privileged global information during training. Extensive experiments on cooperative benchmark tasks with sparse rewards, including Google Research Football (GRF) and StarCraft Multi-agent Challenge (SMAC), demonstrate that ICES exhibits superior exploration capabilities compared with baselines. The code is publicly available at https://github.com/LXXXXR/ICES.

agent, exploration, individual contribution, (11 more...)

2405.1811

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment > Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Cayci, Semih, Eryilmaz, Atilla

Recurrent Natural Policy Gradient for POMDPs

arXiv.org Machine LearningMay-28-2024

In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning. We present finite-time and finite-width analyses for both the critic (recurrent temporal difference learning), and correspondingly-operated recurrent natural policy gradient method in the near-initialization regime. Our analysis demonstrates the efficiency of RNNs for problems with short-term memory with explicit bounds on the required network widths and sample complexity, and points out the challenges in the case of long-term dependencies.

neural network, pomdp, rec-td, (9 more...)

arXiv.org Machine Learning

2405.18221

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Carroll, Micah, Foote, Davis, Siththaranjan, Anand, Russell, Stuart, Dragan, Anca

AI Alignment with Changing and Influenceable Reward Functions

arXiv.org Artificial IntelligenceMay-27-2024

Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI's influence on them. We show that despite its convenience, the static-preference assumption may undermine the soundness of existing alignment techniques, leading them to implicitly reward AI systems for influencing user preferences in ways users may not truly want. We then explore potential solutions. First, we offer a unifying perspective on how an agent's optimization horizon may partially help reduce undesirable AI influence. Then, we formalize different notions of AI alignment that account for preference change from the outset. Comparing the strengths and limitations of 8 such notions of alignment, we find that they all either err towards causing undesirable AI influence, or are overly risk-averse, suggesting that a straightforward solution to the problems of changing preferences may not exist. As there is no avoiding grappling with changing preferences in real-world settings, this makes it all the more important to handle these issues with care, balancing risks and capabilities. We hope our work can provide conceptual clarity and constitute a first step towards AI alignment practices which explicitly account for (and contend with) the changing and influenceable nature of human preferences.

incentive, objective, reward function, (15 more...)

2405.17713

Country:

North America > United States > New York > New York County > New York City (0.27)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(19 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Consumer Health (1.00)
Government (1.00)
Leisure & Entertainment > Games (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
(2 more...)