AITopics | drm

Collaborating Authors

drm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Neural Information Processing SystemsFeb-15-2026, 23:58:35 GMT

To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America (0.14)
Asia > China > Hong Kong (0.04)
Europe (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands

Cho, Seongwon, Ahn, Daechul, Shin, Donghyun, Choi, Hyeonbeom, Kim, San, Choi, Jonghyun

arXiv.org Artificial IntelligenceDec-1-2025

Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental changes. However, most prior approaches update their world representation only at discrete update points such as navigation targets, waypoints, or the end of an action step, leaving robots blind between updates and causing cascading failures: overlooked objects, late error detection, and delayed replanning. To address this limitation, we propose BINDER (Bridging INstant and DEliberative Reasoning), a dual process framework that decouples strategic planning from continuous environment monitoring. Specifically, BINDER integrates a Deliberative Response Module (DRM, a multimodal LLM for task planning) with an Instant Response Module (IRM, a VideoLLM for continuous monitoring). The two modules play complementary roles: the DRM performs strategic planning with structured 3D scene updates and guides what the IRM attends to, while the IRM analyzes video streams to update memory, correct ongoing actions, and trigger replanning when necessary. Through this bidirectional coordination, the modules address the trade off between maintaining awareness and avoiding costly updates, enabling robust adaptation under dynamic conditions. Evaluated in three real world environments with dynamic object placement, BINDER achieves substantially higher success and efficiency than SoTA baselines, demonstrating its effectiveness for real world deployment.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.22364

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Dual-Arm Whole-Body Motion Planning: Leveraging Overlapping Kinematic Chains

Cheng, Richard, Werner, Peter, Matl, Carolyn

arXiv.org Artificial IntelligenceNov-13-2025

Abstract-- High degree-of-freedom dual-arm robots are becoming increasingly common due to their morphology enabling them to operate effectively in human environments. However, motion planning in real-time within unknown, changing environments remains a challenge for such robots due to the high dimensionality of the configuration space and the complex collision-avoidance constraints that must be obeyed. In this work, we propose a novel way to alleviate the curse of dimensionality by leveraging the structure imposed by shared joints (e.g. First, we build two dynamic roadmaps (DRM) for each kinematic chain (i.e. Then, we show that we can leverage this structure to efficiently search through the composition of the two roadmaps and largely sidestep the curse of dimensionality. Finally, we run several experiments in a real-world grocery store with this motion planner on a 19 DoF mobile manipulation robot executing a grocery fulfillment task, achieving 0.4s average planning times with 99.9% success rate across more than 2000 motion plans.

artificial intelligence, collision, motion planning, (17 more...)

arXiv.org Artificial Intelligence

2511.08778

Genre: Research Report (0.64)

Industry: Retail (0.36)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback

Mixture of Experts Approaches in Dense Retrieval Tasks

Sokli, Effrosyni, Kasela, Pranav, Peikos, Georgios, Pasi, Gabriella

arXiv.org Artificial IntelligenceOct-20-2025

Dense Retrieval Models (DRMs) are a prominent development in Information Retrieval (IR). A key challenge with these neural Transformer-based models is that they often struggle to generalize beyond the specific tasks and domains they were trained on. To address this challenge, prior research in IR incorporated the Mixture-of-Experts (MoE) framework within each Transformer layer of a DRM, which, though effective, substantially increased the number of additional parameters. In this paper, we propose a more efficient design, which introduces a single MoE block (SB-MoE) after the final Transformer layer. To assess the retrieval effectiveness of SB-MoE, we perform an empirical evaluation across three IR tasks. Our experiments involve two evaluation setups, aiming to assess both in-domain effectiveness and the model's zero-shot generalizability. In the first setup, we fine-tune SB-MoE with four different underlying DRMs on seven IR benchmarks and evaluate them on their respective test sets. In the second setup, we fine-tune SB-MoE on MSMARCO and perform zero-shot evaluation on thirteen BEIR datasets. Additionally, we perform further experiments to analyze the model's dependency on its hyperparameters (i.e., the number of employed and activated experts) and investigate how this variation affects SB-MoE's performance. The obtained results show that SB-MoE is particularly effective for DRMs with lightweight base models, such as TinyBERT and BERT-Small, consistently exceeding standard model fine-tuning across benchmarks. For DRMs with more parameters, such as BERT-Base and Contriever, our model requires a larger number of training samples to achieve improved retrieval performance. Our code is available online at: https://github.com/FaySokli/SB-MoE.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.15683

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

From to : Multidimensional Supervision of Reasoning Process for LLM Optimization

Wang, Beining, Su, Weihang, Tian, Hongtao, Yang, Tao, Zhou, Yujia, Yao, Ting, Ai, Qingyao, Liu, Yiqun

arXiv.org Artificial IntelligenceOct-14-2025

Improving the multi-step reasoning ability of Large Language Models (LLMs) is a critical yet challenging task. The dominant paradigm, outcome-supervised reinforcement learning (RL VR), rewards only correct final answers, often propagating flawed reasoning and suffering from sparse reward signals. While process-level reward models (PRMs) provide denser, step-by-step feedback, they lack general-izability and interpretability, requiring task-specific segmentation of the reasoning process. To this end, we propose the Dimension-level Reward Model (DRM), a new supervision framework that bridges the gap between these two approaches. DRM evaluates the quality of a reasoning process along three fundamental, complementary, and interpretable dimensions: Confidence for uncertainty calibration, Relevance for semantic alignment, and Coherence for logical consistency. Together, these dimensions capture aspects beyond final answer correctness and enable interpretable assessment without requiring ground truth answers. Experimental results show that DRM provides effective supervision signals, guides the optimization of LLMs and enhances their reasoning ability. In particular, DRM-supervised training achieves consistent gains on both in-distribution and out-of-distribution open-domain tasks, including mathematics, question answering, code execution, and puzzles. Our findings demonstrate that multidimensional supervision of the reasoning process can improve the generalized reasoning ability of LLMs beyond the training distribution. Enhancing the reasoning ability of Large Language Models (LLMs) to perform complex and multi-step reasoning remains a central challenge in their development (Zhang et al., 2025b; Xu et al., 2025). The dominant paradigm for enhancement relies on Reinforcement Learning with V erifiable Rewards (RL VR) (Shao et al., 2024; Y ang et al., 2024; Luo et al., 2024). RL VR provides supervision at the outcome level, assigning a positive reward only if the final answer is correct. However, this reward mechanism has fundamental limitations. First, answer supervision overlooks the quality of the reasoning process (Y u et al., 2025a).

large language model, machine learning, reasoning process, (21 more...)

arXiv.org Artificial Intelligence

2510.11457

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

7972f3735e104a54715922aa416fde1b-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 06:47:50 GMT

concept description, core feature, fine-tuning, (15 more...)

Neural Information Processing Systems

Country:

North America (0.14)
Asia > China > Hong Kong (0.04)
Europe (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Towards Reliable Retrieval in RAG Systems for Large Legal Datasets

Reuter, Markus, Lingenberg, Tobias, Liepiņa, Rūta, Lagioia, Francesca, Lippi, Marco, Sartor, Giovanni, Passerini, Andrea, Sayin, Burcu

arXiv.org Artificial IntelligenceOct-9-2025

Retrieval-Augmented Generation (RAG) is a promising approach to mitigate hallucinations in Large Language Models (LLMs) for legal applications, but its reliability is critically dependent on the accuracy of the retrieval step. This is particularly challenging in the legal domain, where large databases of structurally similar documents often cause retrieval systems to fail. In this paper, we address this challenge by first identifying and quantifying a critical failure mode we term Document-Level Retrieval Mismatch (DRM), where the retriever selects information from entirely incorrect source documents. To mitigate DRM, we investigate a simple and computationally efficient technique which we refer to as Summary-Augmented Chunking (SAC). This method enhances each text chunk with a document-level synthetic summary, thereby injecting crucial global context that would otherwise be lost during a standard chunking process. Our experiments on a diverse set of legal information retrieval tasks show that SAC greatly reduces DRM and, consequently, also improves text-level retrieval precision and recall. Interestingly, we find that a generic summarization strategy outperforms an approach that incorporates legal expert domain knowledge to target specific legal elements. Our work provides evidence that this practical, scalable, and easily integrable technique enhances the reliability of RAG systems when applied to large-scale legal document datasets.

information, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.06999

Country: Europe (0.68)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Deceptive Risk Minimization: Out-of-Distribution Generalization by Deceiving Distribution Shift Detectors

Majumdar, Anirudha

arXiv.org Artificial IntelligenceSep-16-2025

This paper proposes deception as a mechanism for out-of-distribution (OOD) generalization: by learning data representations that make training data appear independent and identically distributed (iid) to an observer, we can identify stable features that eliminate spurious correlations and generalize to unseen domains. We refer to this principle as deceptive risk minimization (DRM) and instantiate it with a practical differentiable objective that simultaneously learns features that eliminate distribution shifts from the perspective of a detector based on conformal martingales while minimizing a task-specific loss. In contrast to domain adaptation or prior invariant representation learning methods, DRM does not require access to test data or a partitioning of training data into a finite number of data-generating domains. We demonstrate the efficacy of DRM on numerical experiments with concept shift and a simulated imitation learning setting with covariate shift in environments that a robot is deployed in.

artificial intelligence, distribution shift, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.12081

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Policy Newton methods for Distortion Riskmetrics

Pachal, Soumen, Maniyar, Mizhaan Prajit, A, Prashanth L.

arXiv.org Artificial IntelligenceAug-12-2025

We consider the problem of risk-sensitive control in a reinforcement learning (RL) framework. In particular, we aim to find a risk-optimal policy by maximizing the distortion riskmetric (DRM) of the discounted reward in a finite horizon Markov decision process (MDP). DRMs are a rich class of risk measures that include several well-known risk measures as special cases. We derive a policy Hessian theorem for the DRM objective using the likelihood ratio method. Using this result, we propose a natural DRM Hessian estimator from sample trajectories of the underlying MDP. Next, we present a cubic-regularized policy Newton algorithm for solving this problem in an on-policy RL setting using estimates of the DRM gradient and Hessian. Our proposed algorithm is shown to converge to an $ε$-second-order stationary point ($ε$-SOSP) of the DRM objective, and this guarantee ensures the escaping of saddle points. The sample complexity of our algorithms to find an $ ε$-SOSP is $\mathcal{O}(ε^{-3.5})$. Our experiments validate the theoretical findings. To the best of our knowledge, our is the first work to present convergence to an $ε$-SOSP of a risk-sensitive objective, while existing works in the literature have either shown convergence to a first-order stationary point of a risk-sensitive objective, or a SOSP of a risk-neutral one.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.07249

Genre: Research Report (0.64)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Superfast Configuration-Space Convex Set Computation on GPUs for Online Motion Planning

Werner, Peter, Cheng, Richard, Stewart, Tom, Tedrake, Russ, Rus, Daniela

arXiv.org Artificial IntelligenceApr-16-2025

In this work, we leverage GPUs to construct probabilistically collision-free convex sets in robot configuration space on the fly. This extends the use of modern motion planning algorithms that leverage such representations to changing environments. These planners rapidly and reliably optimize high-quality trajectories, without the burden of challenging nonconvex collision-avoidance constraints. We present an algorithm that inflates collision-free piecewise linear paths into sequences of convex sets (SCS) that are probabilistically collision-free using massive parallelism. We then integrate this algorithm into a motion planning pipeline, which leverages dynamic roadmaps to rapidly find one or multiple collision-free paths, and inflates them. We then optimize the trajectory through the probabilistically collision-free sets, simultaneously using the candidate trajectory to detect and remove collisions from the sets. We demonstrate the efficacy of our approach on a simulation benchmark and a KUKA iiwa 7 robot manipulator with perception in the loop. On our benchmark, our approach runs 17.1 times faster and yields a 27.9% increase in reliability over the nonlinear trajectory optimization baseline, while still producing high-quality motion plans.

artificial intelligence, collision, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2504.10783

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Add feedback