mpo
d5ff135377d39f1de7372c95c74dd962-Supplemental.pdf
Ifthepickedlabeliscorrect, theagentgetsarewardofr = 0,andtheepisode ends, and ifthe picked label isincorrect, then the agent gets areward ofr = 1,and the episode continues to the next time-step (where it must guess another label for thesameimage). For the variant labelled "Adaptive", we train a classifierpθ(y|x)on the training dataset of images with the same architecture as the DQN agent. Clearly,thepolicy"alwaysswitch" is optimal inMA and so is -optimal under the distribution on MDPs. The proof is a simple modification of the construction in Proposition 5.1. Effectively, this policy either visits the left-most state or the rightmost state inthe final level.
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Choi, Yumin, Kim, Dongki, Baek, Jinheon, Hwang, Sung Ju
Large Language Models (LLMs) have shown remarkable success, and their multimodal expansions (MLLMs) further unlock capabilities spanning images, videos, and other modalities beyond text. However, despite this shift, prompt optimization approaches, designed to reduce the burden of manual prompt crafting while maximizing performance, remain confined to text, ultimately limiting the full potential of MLLMs. Motivated by this gap, we introduce the new problem of multimodal prompt optimization, which expands the prior definition of prompt optimization to the multimodal space defined by the pairs of textual and non-textual prompts. To tackle this problem, we then propose the Multimodal Prompt Optimizer (MPO), a unified framework that not only performs the joint optimization of multimodal prompts through alignment-preserving updates but also guides the selection process of candidate prompts by leveraging earlier evaluations as priors in a Bayesian-based selection strategy. Through extensive experiments across diverse modalities that go beyond text, such as images, videos, and even molecules, we demonstrate that MPO outperforms leading text-only optimization methods, establishing multimodal prompt optimization as a crucial step to realizing the potential of MLLMs.
- North America > United States > California (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.92)
- North America > Canada > Alberta (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Physics Informed Constrained Learning of Dynamics from Static Data
Dang, Pengtao, Guo, Tingbo, Fishel, Melissa, Lin, Guang, Wu, Wenzhuo, Cao, Sha, Zhang, Chi
A physics-informed neural network (PINN) models the dynamics of a system by integrating the governing physical laws into the architecture of a neural network. By enforcing physical laws as constraints, PINN overcomes challenges with data scarsity and potentially high dimensionality. Existing PINN frameworks rely on fully observed time-course data, the acquisition of which could be prohibitive for many systems. In this study, we developed a new PINN learning paradigm, namely Constrained Learning, that enables the approximation of first-order derivatives or motions using non-time course or partially observed data. Computational principles and a general mathematical formulation of Constrained Learning were developed. We further introduced MPOCtrL (Message Passing Optimization-based Constrained Learning) an optimization approach tailored for the Constrained Learning framework that strives to balance the fitting of physical models and observed data. Its code is available at github link: https://github.com/ptdang1001/MPOCtrL Experiments on synthetic and real-world data demonstrated that MPOCtrL can effectively detect the nonlinear dependency between observed data and the underlying physical properties of the system. In particular, on the task of metabolic flux analysis, MPOCtrL outperforms all existing data-driven flux estimators.
- North America > United States > Oregon (0.04)
- North America > United States > Indiana (0.04)
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Wang, Tianze, Gui, Dongnan, Hu, Yifan, Lin, Shuhang, Zhang, Linjun
Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this limitation by leveraging multi-dimensional feedback to fine-tune corresponding reward models and train LLMs using reinforcement learning. However, the process is costly and unstable, especially given the competing and heterogeneous nature of human preferences. In this paper, we propose Mixing Preference Optimization (MPO), a post-processing framework for aggregating single-objective policies as an alternative to both multi-objective RLHF (MORLHF) and MaxMin-RLHF. MPO avoids alignment from scratch. Instead, it log-linearly combines existing policies into a unified one with the weight of each policy computed via a batch stochastic mirror descent. Empirical results demonstrate that MPO achieves balanced performance across diverse preferences, outperforming or matching existing models with significantly reduced computational costs.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Florida > Hillsborough County > University (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (2 more...)
AMPO: Active Multi-Preference Optimization
Gupta, Taneesh, Madhavan, Rahul, Zhang, Xuchao, Bansal, Chetan, Rajmohan, Saravan
Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $\textit{Active Multi-Preference Optimization}$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and worst answers but also subtle, underexplored modes that are crucial for robust alignment. Theoretically, we provide guarantees for expected reward maximization using our active selection method, and empirically, AMPO achieves state-of-the-art results on $\textit{AlpacaEval}$ using Llama 8B.
- Research Report (1.00)
- Overview (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
- (2 more...)
Quantum Large Language Models via Tensor Network Disentanglers
Aizpurua, Borja, Jahromi, Saeed S., Singh, Sukhbinder, Orus, Roman
We propose a method to enhance the performance of Large Language Models (LLMs) by integrating quantum computing and quantum-inspired techniques. Specifically, our approach involves replacing the weight matrices in the Self-Attention and Multi-layer Perceptron layers with a combination of two variational quantum circuits and a quantum-inspired tensor network, such as a Matrix Product Operator (MPO). This substitution enables the reproduction of classical LLM functionality by decomposing weight matrices through the application of tensor network disentanglers and MPOs, leveraging well-established tensor network techniques. By incorporating more complex and deeper quantum circuits, along with increasing the bond dimensions of the MPOs, our method captures additional correlations within the quantum-enhanced LLM, leading to improved accuracy beyond classical models while maintaining low memory overhead.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)