AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Neural Information Processing SystemsFeb-11-2026, 09:05:06 GMT

d5ff135377d39f1de7372c95c74dd962-Supplemental.pdf

Ifthepickedlabeliscorrect, theagentgetsarewardofr = 0,andtheepisode ends, and ifthe picked label isincorrect, then the agent gets areward ofr = 1,and the episode continues to the next time-step (where it must guess another label for thesameimage). For the variant labelled "Adaptive", we train a classifierpθ(y|x)on the training dataset of images with the same architecture as the DQN agent. Clearly,thepolicy"alwaysswitch" is optimal inMA and so is -optimal under the distribution on MDPs. The proof is a simple modification of the construction in Proposition 5.1. Effectively, this policy either visits the left-most state or the rightmost state inthe final level.

artificial intelligence, jmi, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 16:15:16 GMT

517f24c02e620d5a4dac1db388664a63-Supplemental.pdf

cumulative reward cumulative reward, mpo, training step, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-8-2026, 16:15:11 GMT

Co-AdaptationofAlgorithmicand ImplementationalInnovationsin Inference-basedDeepReinforcementLearning

Recently many algorithms were devised for reinforcement learning (RL) with functionapproximation.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceOct-13-2025

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Choi, Yumin, Kim, Dongki, Baek, Jinheon, Hwang, Sung Ju

Large Language Models (LLMs) have shown remarkable success, and their multimodal expansions (MLLMs) further unlock capabilities spanning images, videos, and other modalities beyond text. However, despite this shift, prompt optimization approaches, designed to reduce the burden of manual prompt crafting while maximizing performance, remain confined to text, ultimately limiting the full potential of MLLMs. Motivated by this gap, we introduce the new problem of multimodal prompt optimization, which expands the prior definition of prompt optimization to the multimodal space defined by the pairs of textual and non-textual prompts. To tackle this problem, we then propose the Multimodal Prompt Optimizer (MPO), a unified framework that not only performs the joint optimization of multimodal prompts through alignment-preserving updates but also guides the selection process of candidate prompts by leveraging earlier evaluations as priors in a Bayesian-based selection strategy. Through extensive experiments across diverse modalities that go beyond text, such as images, videos, and even molecules, we demonstrate that MPO outperforms leading text-only optimization methods, establishing multimodal prompt optimization as a crucial step to realizing the potential of MLLMs.

large language model, machine learning, natural language, (22 more...)

2510.09201

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

arXiv.org Artificial IntelligenceApr-23-2025

Physics Informed Constrained Learning of Dynamics from Static Data

Dang, Pengtao, Guo, Tingbo, Fishel, Melissa, Lin, Guang, Wu, Wenzhuo, Cao, Sha, Zhang, Chi

A physics-informed neural network (PINN) models the dynamics of a system by integrating the governing physical laws into the architecture of a neural network. By enforcing physical laws as constraints, PINN overcomes challenges with data scarsity and potentially high dimensionality. Existing PINN frameworks rely on fully observed time-course data, the acquisition of which could be prohibitive for many systems. In this study, we developed a new PINN learning paradigm, namely Constrained Learning, that enables the approximation of first-order derivatives or motions using non-time course or partially observed data. Computational principles and a general mathematical formulation of Constrained Learning were developed. We further introduced MPOCtrL (Message Passing Optimization-based Constrained Learning) an optimization approach tailored for the Constrained Learning framework that strives to balance the fitting of physical models and observed data. Its code is available at github link: https://github.com/ptdang1001/MPOCtrL Experiments on synthetic and real-world data demonstrated that MPOCtrL can effectively detect the nonlinear dependency between observed data and the underlying physical properties of the system. In particular, on the task of metabolic flux analysis, MPOCtrL outperforms all existing data-driven flux estimators.

artificial intelligence, machine learning, neural network, (17 more...)

2504.12675

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Artificial IntelligenceFeb-25-2025

MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment

Wang, Tianze, Gui, Dongnan, Hu, Yifan, Lin, Shuhang, Zhang, Linjun

Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical results demonstrate that MPO achieves balanced performance across diverse preferences, outperforming or matching existing models with significantly reduced computational costs.

exp null 1, language model, objective, (15 more...)

2502.18699

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gupta, Taneesh, Madhavan, Rahul, Zhang, Xuchao, Bansal, Chetan, Rajmohan, Saravan

AMPO: Active Multi-Preference Optimization

arXiv.org Artificial IntelligenceFeb-25-2025

Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $\textit{Active Multi-Preference Optimization}$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and worst answers but also subtle, underexplored modes that are crucial for robust alignment. Theoretically, we provide guarantees for expected reward maximization using our active selection method, and empirically, AMPO achieves state-of-the-art results on $\textit{AlpacaEval}$ using Llama 8B.

alignment, arxiv preprint arxiv, optimization, (13 more...)

2502.18293

Country: Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
(2 more...)

Aizpurua, Borja, Jahromi, Saeed S., Singh, Sukhbinder, Orus, Roman

Quantum Large Language Models via Tensor Network Disentanglers

arXiv.org Artificial IntelligenceOct-22-2024

We propose a method to enhance the performance of Large Language Models (LLMs) by integrating quantum computing and quantum-inspired techniques. Specifically, our approach involves replacing the weight matrices in the Self-Attention and Multi-layer Perceptron layers with a combination of two variational quantum circuits and a quantum-inspired tensor network, such as a Matrix Product Operator (MPO). This substitution enables the reproduction of classical LLM functionality by decomposing weight matrices through the application of tensor network disentanglers and MPOs, leveraging well-established tensor network techniques. By incorporating more complex and deeper quantum circuits, along with increasing the bond dimensions of the MPOs, our method captures additional correlations within the quantum-enhanced LLM, leading to improved accuracy beyond classical models while maintaining low memory overhead.

large language model, machine learning, natural language, (19 more...)