AITopics | Sarawak

Collaborating Authors

Sarawak

Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics

Josifovski, Josip, Gu, Shangding, Malmir, Mohammadhossein, Huang, Haoliang, Auddy, Sayantan, Navarro-Guerrero, Nicolás, Spanos, Costas, Knoll, Alois

arXiv.org Artificial IntelligenceMar-13-2025

Domain randomization has emerged as a fundamental technique in reinforcement learning (RL) to facilitate the transfer of policies from simulation to real-world robotic applications. Many existing domain randomization approaches have been proposed to improve robustness and sim2real transfer. These approaches rely on wide randomization ranges to compensate for the unknown actual system parameters, leading to robust but inefficient real-world policies. In addition, the policies pretrained in the domain-randomized simulation are fixed after deployment due to the inherent instability of the optimization processes based on RL and the necessity of sampling exploitative but potentially unsafe actions on the real system. This limits the adaptability of the deployed policy to the inevitably changing system parameters or environment dynamics over time. We leverage safe RL and continual learning under domain-randomized simulation to address these limitations and enable safe deployment-time policy adaptation in real-world robot control. The experiments show that our method enables the policy to adapt and fit to the current domain distribution and environment dynamics of the real system while minimizing safety risks and avoiding issues like catastrophic forgetting of the general policy found in randomized simulation during the pretraining phase. Videos and supplementary material are available at https://safe-cda.github.io/.

adaptation, learning, simulation, (14 more...)

arXiv.org Artificial Intelligence

2503.10949

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(19 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Water Quality Data Imputation via A Fast Latent Factorization of Tensors with PID-based Optimizer

Liu, Qian, Wang, Lan, Yang, Bing, Wu, Hao

arXiv.org Artificial IntelligenceMar-10-2025

Water quality data can supply a substantial decision support for water resources utilization and pollution prevention. However, there are numerous missing values in water quality data due to inescapable factors like sensor failure, thereby leading to biased result for hydrological analysis and failing to support environmental governance decision accurately. A Latent Factorization of Tensors (LFT) with Stochastic Gradient Descent (SGD) proves to be an efficient imputation method. However, a standard SGD-based LFT model commonly surfers from the slow convergence that impairs its efficiency. To tackle this issue, this paper proposes a Fast Latent Factorization of Tensors (FLFT) model. It constructs an adjusted instance error into SGD via leveraging a nonlinear PID controller to incorporates the past, current and future information of prediction error for improving convergence rate. Comparing with state-of-art models in real world datasets, the results of experiment indicate that the FLFT model achieves a better convergence rate and higher accuracy.

factorization, ieee transaction, latent factorization, (11 more...)

arXiv.org Artificial Intelligence

2503.06997

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
Asia > Singapore (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(11 more...)

Genre: Research Report (0.64)

Industry: Water & Waste Management > Water Management > Water Supplies & Services (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Add feedback

HIPPO-MAT: Decentralized Task Allocation Using GraphSAGE and Multi-Agent Deep Reinforcement Learning

Ratnabala, Lavanya, Peter, Robinroy, Fedoseev, Aleksey, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceMar-8-2025

This paper tackles decentralized continuous task allocation in heterogeneous multi-agent systems. We present a novel framework HIPPO-MAT that integrates graph neural networks (GNN) employing a GraphSAGE architecture to compute independent embeddings on each agent with an Independent Proximal Policy Optimization (IPPO) approach for multi-agent deep reinforcement learning. In our system, unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) share aggregated observation data via communication channels while independently processing these inputs to generate enriched state embeddings. This design enables dynamic, cost-optimal, conflict-aware task allocation in a 3D grid environment without the need for centralized coordination. A modified A* path planner is incorporated for efficient routing and collision avoidance. Simulation experiments demonstrate scalability with up to 30 agents and preliminary real-world validation on JetBot ROS AI Robots, each running its model on a Jetson Nano and communicating through an ESP-NOW protocol using ESP32-S3, which confirms the practical viability of the approach that incorporates simultaneous localization and mapping (SLAM). Experimental results revealed that our method achieves a high 92.5% conflict-free success rate, with only a 16.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on greedy approach. Additionally, the framework exhibits scalability with up to 30 agents with allocation processing of 0.32 simulation step time and robustness in responding to dynamically generated tasks.

agent, allocation, task allocation, (14 more...)

arXiv.org Artificial Intelligence

2503.07662

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Quebec > Montreal (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry:

Transportation (0.49)
Information Technology > Robotics & Automation (0.34)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Contextual bandits with entropy-based human feedback

Seraj, Raihan, Meng, Lili, Sylvain, Tristan

arXiv.org Artificial IntelligenceFeb-12-2025

This work investigates how explicit human feedback can enhance CB performance. Building on successful integrations In recent years, preference-based human feedback of human guidance in reinforcement learning (Christiano mechanisms have become essential for enhancing et al., 2017; MacGlashan et al., 2017) and conversational model performance across diverse applications, AI (Achiam et al., 2023), we distinguish two primary feedback including conversational AI systems such as Chat-paradigms: (1) action-based feedback, where experts GPT. However, existing approaches often neglect directly prescribe optimal actions for specific contexts (Osa critical aspects, such as model uncertainty and et al., 2018; Li et al., 2023), and (2) preference-based feedback, the variability in feedback quality. To address where humans compare pairs of learner-generated actions these challenges, we introduce an entropy-based to express relative preferences (Christiano et al., 2017; human feedback framework for contextual bandits, Saha et al., 2023). While action-based methods require precise which dynamically balances exploration and expert knowledge, we focus on preference feedback for exploitation by soliciting expert feedback only its practical advantages in scalable data collection, notably when model entropy exceeds a predefined threshold.

bandit, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.08759

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States (0.04)
Asia > Malaysia > Sarawak > Kuching (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

MAGNNET: Multi-Agent Graph Neural Network-based Efficient Task Allocation for Autonomous Vehicles with Deep Reinforcement Learning

Ratnabala, Lavanya, Fedoseev, Aleksey, Peter, Robinroy, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceFeb-4-2025

This paper addresses the challenge of decentralized task allocation within heterogeneous multi-agent systems operating under communication constraints. We introduce a novel framework that integrates graph neural networks (GNNs) with a centralized training and decentralized execution (CTDE) paradigm, further enhanced by a tailored Proximal Policy Optimization (PPO) algorithm for multi-agent deep reinforcement learning (MARL). Our approach enables unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to dynamically allocate tasks efficiently without necessitating central coordination in a 3D grid environment. The framework minimizes total travel time while simultaneously avoiding conflicts in task assignments. For the cost calculation and routing, we employ reservation-based A* and R* path planners. Experimental results revealed that our method achieves a high 92.5% conflict-free success rate, with only a 7.49% performance gap compared to the centralized Hungarian method, while outperforming the heuristic decentralized baseline based on greedy approach. Additionally, the framework exhibits scalability with up to 20 agents with allocation processing of 2.8 s and robustness in responding to dynamically generated tasks, underscoring its potential for real-world applications in complex multi-agent scenarios.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.02311

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Robotics & Automation (0.34)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hierarchical Multi-Graphs Learning for Robust Group Re-Identification

Liu, Ruiqi, Liu, Xingyu, Xu, Xiaohao, Zhang, Yixuan, Ge, Yongxin, Weng, Lubin

arXiv.org Artificial IntelligenceDec-24-2024

Group Re-identification (G-ReID) faces greater complexity than individual Re-identification (ReID) due to challenges like mutual occlusion, dynamic member interactions, and evolving group structures. Prior graph-based approaches have aimed to capture these dynamics by modeling the group as a single topological structure. However, these methods struggle to generalize across diverse group compositions, as they fail to fully represent the multifaceted relationships within the group. In this study, we introduce a Hierarchical Multi-Graphs Learning (HMGL) framework to address these challenges. Our approach models the group as a collection of multi-relational graphs, leveraging both explicit features (such as occlusion, appearance, and foreground information) and implicit dependencies between members. This hierarchical representation, encoded via a Multi-Graphs Neural Network (MGNN), allows us to resolve ambiguities in member relationships, particularly in complex, densely populated scenes. To further enhance matching accuracy, we propose a Multi-Scale Matching (MSM) algorithm, which mitigates issues of member information ambiguity and sensitivity to hard samples, improving robustness in challenging scenarios. Our method achieves state-of-the-art performance on two standard benchmarks, CSG and RoadGroup, with Rank-1/mAP scores of 95.3%/94.4% and 93.9%/95.4%, respectively. These results mark notable improvements of 1.7% and 2.5% in Rank-1 accuracy over existing approaches.

graph, information, matching, (13 more...)

arXiv.org Artificial Intelligence

2412.18766

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Fast Semi-supervised Learning on Large Graphs: An Improved Green-function Method

Nie, Feiping, Song, Yitao, Chang, Wei, Wang, Rong, Li, Xuelong

arXiv.org Artificial IntelligenceNov-3-2024

In the graph-based semi-supervised learning, the Green-function method is a classical method that works by computing the Green's function in the graph space. However, when applied to large graphs, especially those sparse ones, this method performs unstably and unsatisfactorily. We make a detailed analysis on it and propose a novel method from the perspective of optimization. On fully connected graphs, the method is equivalent to the Green-function method and can be seen as another interpretation with physical meanings, while on non-fully connected graphs, it helps to explain why the Green-function method causes a mess on large sparse graphs. To solve this dilemma, we propose a workable approach to improve our proposed method. Unlike the original method, our improved method can also apply two accelerating techniques, Gaussian Elimination, and Anchored Graphs to become more efficient on large graphs. Finally, the extensive experiments prove our conclusions and the efficiency, accuracy, and stability of our improved Green's function method.

anchor point, graph, green-function method, (12 more...)

arXiv.org Artificial Intelligence

2411.01792

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Asia > Philippines (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Malaysia > Sarawak > Kuching (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Network scaling and scale-driven loss balancing for intelligent poroelastography

Xu, Yang, Pourahmadian, Fatemeh

arXiv.org Artificial IntelligenceOct-27-2024

A deep learning framework is developed for multiscale characterization of poroelastic media from full waveform data which is known as poroelastography. Special attention is paid to heterogeneous environments whose multiphase properties may drastically change across several scales. Described in space-frequency, the data takes the form of focal solid displacement and pore pressure fields in various neighborhoods furnished either by reconstruction from remote data or direct measurements depending on the application. The objective is to simultaneously recover the six hydromechanical properties germane to Biot equations and their spatial distribution in a robust and efficient manner. Two major challenges impede direct application of existing state-of-the-art techniques for this purpose: (i) the sought-for properties belong to vastly different and potentially uncertain scales, and~(ii) the loss function is multi-objective and multi-scale (both in terms of its individual components and the total loss). To help bridge the gap, we propose the idea of \emph{network scaling} where the neural property maps are constructed by unit shape functions composed into a scaling layer. In this model, the unknown network parameters (weights and biases) remain of O(1) during training. This forms the basis for explicit scaling of the loss components and their derivatives with respect to the network parameters. Thereby, we propose the physics-based \emph{dynamic scaling} approach for adaptive loss balancing. The idea is first presented in a generic form for multi-physics and multi-scale PDE systems, and then applied through a set of numerical experiments to poroelastography. The results are presented along with reconstructions by way of gradient normalization (GradNorm) and Softmax adaptive weights (SoftAdapt) for loss balancing. A comparative analysis of the methods and corresponding results is provided.

artificial intelligence, machine learning, reconstruction, (19 more...)

arXiv.org Artificial Intelligence

2411.08886

Country:

North America > United States > Colorado (0.28)
Asia > Malaysia > Sarawak (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Energy > Renewable > Geothermal (1.00)
Energy > Oil & Gas > Upstream (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks

Xu, Yinglun, Wang, Zhiwei, Singh, Gagandeep

arXiv.org Artificial IntelligenceOct-25-2024

Thompson sampling is one of the most popular learning algorithms for online sequential decision-making problems and has rich real-world applications. However, current Thompson sampling algorithms are limited by the assumption that the rewards received are uncorrupted, which may not be true in real-world applications where adversarial reward poisoning exists. To make Thompson sampling more reliable, we want to make it robust against adversarial reward poisoning. The main challenge is that one can no longer compute the actual posteriors for the true reward, as the agent can only observe the rewards after corruption. In this work, we solve this problem by computing pseudo-posteriors that are less likely to be manipulated by the attack. We propose robust algorithms based on Thompson sampling for the popular stochastic and contextual linear bandit settings in both cases where the agent is aware or unaware of the budget of the attacker. We theoretically show that our algorithms guarantee near-optimal regret under any attack strategy.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2410.19705

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > Malaysia > Sarawak > Kuching (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.49)
Government > Military (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Gu, Yuntian, Chen, Xuzheng

arXiv.org Artificial IntelligenceOct-15-2024

Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently handling the nested structure. This paper introduces a novel gradient-based approach for multilevel optimization that overcomes these limitations by leveraging a hierarchically structured decomposition of the full gradient and employing advanced propagation techniques. Extending to n-level scenarios, our method significantly reduces computational complexity while improving both solution accuracy and convergence speed. We demonstrate the effectiveness of our approach through numerical experiments, comparing it with existing methods across several benchmarks. The results show a notable improvement in solution accuracy. To the best of our knowledge, this is one of the first algorithms to provide a general version of implicit differentiation with both theoretical guarantees and superior empirical performance.

artificial intelligence, machine learning, optimization, (15 more...)

arXiv.org Artificial Intelligence

2410.11312

Country:

North America > United States > District of Columbia > Washington (0.04)
Asia > Malaysia > Sarawak > Kuching (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback