AITopics

artificial intelligence, international conferenceon machine learning, reinforcement learning, (10 more...)

Learning by interaction is the key to skill acquisition for most living organisms, which is formally called Reinforcement Learning (RL). RL is efficient in finding optimal policies for endowing complex systems with sophisticated behavior. All paradigms of RL utilize a system model for finding the optimal policy. Modeling dynamics can be done by formulating a mathematical model or system identification. Dynamic models are usually exposed to aleatoric and epistemic uncertainties that can divert the model from the one acquired and cause the RL algorithm to exhibit erroneous behavior. Accordingly, the RL process sensitive to operating conditions and changes in model parameters and lose its generality. To address these problems, Intensive system identification for modeling purposes is needed for each system even if the model dynamics structure is the same, as the slight deviation in the model parameters can render the model useless in RL. The existence of an oracle that can adaptively predict the rest of the trajectory regardless of the uncertainties can help resolve the issue. The target of this work is to present a framework for facilitating the system identification of different instances of the same dynamics class by learning a probability distribution of the dynamics conditioned on observed data with variational inference and show its reliability in robustly solving different instances of control problems with the same model in model-based RL with maximum sample efficiency.

doi: 10.1109/ACIRS52449.2021.9519314

2103.0885

Country: Africa > Middle East > Egypt (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Shah, Muhammad Sa'ood, Jeewa, Asad

Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments

Scalarisation functions are widely employed in MORL algorithms to enable intelligent decision-making. However, these functions often struggle to approximate the Pareto front accurately, rendering them unideal in complex, uncertain environments. This study examines selected Multi-Objective Reinforcement Learning (MORL) algorithms across MORL environments with discrete action and observation spaces. We aim to investigate further the limitations associated with scalarisation approaches for decision-making in multi-objective settings. Specifically, we use an outer-loop multi-policy methodology to assess the performance of a seminal single-policy MORL algorithm, MO Q-Learning implemented with linear scalarisation and Chebyshev scalarisation functions. In addition, we explore a pioneering inner-loop multi-policy algorithm, Pareto Q-Learning, which offers a more robust alternative. Our findings reveal that the performance of the scalarisation functions is highly dependent on the environment and the shape of the Pareto front. These functions often fail to retain the solutions uncovered during learning and favour finding solutions in certain regions of the solution space. Moreover, finding the appropriate weight configurations to sample the entire Pareto front is complex, limiting their applicability in uncertain settings. In contrast, inner-loop multi-policy algorithms may provide a more sustainable and generalizable approach and potentially facilitate intelligent decision-making in dynamic and uncertain environments.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2511.16476

Country: Africa (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Caunhye, Ali Murtaza, Jeewa, Asad

A Comparison Between Decision Transformers and Traditional Offline Reinforcement Learning Algorithms

The field of Offline Reinforcement Learning (RL) aims to derive effective policies from pre-collected datasets without active environment interaction. While traditional offline RL algorithms like Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) have shown promise, they often face challenges in balancing exploration and exploitation, especially in environments with varying reward densities. The recently proposed Decision Transformer (DT) approach, which reframes offline RL as a sequence modelling problem, has demonstrated impressive results across various benchmarks. This paper presents a comparative study evaluating the performance of DT against traditional offline RL algorithms in dense and sparse reward settings for the ANT con-tinous control environment. Our research investigates how these algorithms perform when faced with different reward structures, examining their ability to learn effective policies and generalize across varying levels of feedback. Through empirical analysis in the ANT environment, we found that DTs showed less sensitivity to varying reward density compared to other methods and particularly excelled with medium-expert datasets in sparse reward scenarios. In contrast, traditional value-based methods like IQL showed improved performance in dense reward settings with high-quality data, while CQL offered balanced performance across different data qualities. Additionally, DTs exhibited lower variance in performance but required significantly more computational resources compared to traditional approaches. These findings suggest that sequence modelling approaches may be more suitable for scenarios with uncertain reward structures or mixed-quality data, while value-based methods remain competitive in settings with dense rewards and high-quality demonstrations.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2511.16475

Country: Africa (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhang, Jingru, Moradi, Saed, Saha, Ashirbani

Externally Validated Multi-Task Learning via Consistency Regularization Using Differentiable BI-RADS Features for Breast Ultrasound Tumor Segmentation

Multi-task learning can suffer from destructive task interference, where jointly trained models underperform single-task baselines and limit generalization. To improve generalization performance in breast ultrasound-based tumor segmentation via multi-task learning, we propose a novel consistency regularization approach that mitigates destructive interference between segmentation and classification. The consistency regularization approach is composed of differentiable BI-RADS-inspired morphological features. We validated this approach by training all models on the BrEaST dataset (Poland) and evaluating them on three external datasets: UDIAT (Spain), BUSI (Egypt), and BUS-UCLM (Spain). Our comprehensive analysis demonstrates statistically significant (p<0.001) improvements in generalization for segmentation task of the proposed multi-task approach vs. the baseline one: UDIAT, BUSI, BUS-UCLM (Dice coefficient=0.81 vs 0.59, 0.66 vs 0.56, 0.69 vs 0.49, resp.). The proposed approach also achieves state-of-the-art segmentation performance under rigorous external validation on the UDIAT dataset.

artificial intelligence, classification, machine learning, (15 more...)

2511.15968

Country:

Europe > Spain (0.45)
Africa > Middle East > Egypt (0.24)
North America > Canada > Ontario > Hamilton (0.15)

Genre: Research Report > Experimental Study (0.35)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.95)
Health & Medicine > Therapeutic Area (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

Neural Information Processing SystemsNov-20-2025, 23:23:58 GMT

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

Jeffrey Chan, Valerio Perrone, Jeffrey Spence, Paul Jenkins, Sara Mathieson, Yun Song

An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data.

artificial intelligence, machine learning, neural network, (18 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Utah (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Neural Information Processing SystemsNov-20-2025, 21:28:56 GMT

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

Yaodong Yu, Pan Xu, Quanquan Gu

In this paper, we aim to design efficient stochastic optimization algorithms that can find an approximate local minimum of ( 1.1), i.e., an (,

algorithm, artificial intelligence, machine learning, (15 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Asia > Middle East > Jordan (0.05)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Neural Information Processing SystemsNov-20-2025, 21:28:37 GMT

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Simon S. Du, Wei Hu, Jason D. Lee

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e.

artificial intelligence, machine learning, neural network, (14 more...)

Country:

North America > United States > California (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Filip Hanzely, Konstantin Mishchenko, Peter Richtarik

SEGA: Variance Reduction via Gradient Sketching

Neural Information Processing SystemsNov-20-2025, 21:23:59 GMT

We propose a randomized first order optimization method-- SEGA (SkEtched GrAdient)--which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure. This unbiased estimate is then used to perform a gradient step. Unlike standard subspace descent methods, such as coordinate descent, SEGA can be used for optimization problems with a non-separable proximal term. We provide a general convergence analysis and prove linear convergence for strongly convex objectives. In the special case of coordinate sketches, SEGA can be enhanced with various techniques such as importance sampling, minibatching and acceleration, and its rate is up to a small constant factor identical to the best-known rate of coordinate descent.

artificial intelligence, machine learning, optimization problem, (14 more...)