AITopics | preference alignment

Collaborating Authors

preference alignment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

Neural Information Processing SystemsMar-20-2026, 00:18:39 GMT

Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech. Countermeasures, commonly referred to as preference alignment, include fine-tuning the pretrained LLMs with carefully crafted text examples of desired behaviour. Even then, empirical evidence shows preference aligned LLMs can be enticed to harmful behaviour. This so called jailbreaking of LLMs is typically achieved by adversarially modifying the input prompt to the LLM. Our paper provides theoretical insights into the phenomenon of preference alignment and jailbreaking from a statistical perspective. Under our framework, we first show that pretrained LLMs will mimic harmful behaviour if present in the training corpus.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Preference Alignment with Flow Matching Minu Kim 1 Y ongsik Lee

Neural Information Processing SystemsFeb-11-2026, 15:46:10 GMT

We provide theoretical insights that support our method's alignment with standard preference alignment objectives.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > Monaco (0.04)
Europe > Greece > Attica > Athens (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

PrefGen: Multimodal Preference Learning for Preference-Conditioned Image Generation

Mo, Wenyi, Zhang, Tianyu, Bai, Yalong, Han, Ligong, Ba, Ying, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceDec-9-2025

Preference-conditioned image generation seeks to adapt generative models to individual users, producing outputs that reflect personal aesthetic choices beyond the given textual prompt. Despite recent progress, existing approaches either fail to capture nuanced user preferences or lack effective mechanisms to encode personalized visual signals. In this work, we propose a multimodal framework that leverages multimodal large language models (MLLMs) to extract rich user representations and inject them into diffusion-based image generation. We train the MLLM with a preference-oriented visual question answering task to capture fine-grained semantic cues. To isolate preference-relevant features, we introduce two complementary probing tasks: inter-user discrimination to distinguish between different users, and intra-user discrimination to separate liked from disliked content. To ensure compatibility with diffusion text encoders, we design a maximum mean discrepancy-based alignment loss that bridges the modality gap while preserving multimodal structure. The resulting embeddings are used to condition the generator, enabling faithful adherence to both prompts and user preferences. Extensive experiments demonstrate that our method substantially outperforms strong baselines in both image quality and preference alignment, highlighting the effectiveness of representation extraction and alignment for personalized generation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.0602

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Aligning Generative Music AI with Human Preferences: Methods and Challenges

Herremans, Dorien, Roy, Abhinaba

arXiv.org Artificial IntelligenceNov-20-2025

Recent advances in generative AI for music have achieved remarkable fidelity and stylistic diversity, yet these systems often fail to align with nuanced human preferences due to the specific loss functions they use. This paper advocates for the systematic application of preference alignment techniques to music generation, addressing the fundamental gap between computational optimization and human musical appreciation. Drawing on recent breakthroughs including MusicRL's large-scale preference learning, multi-preference alignment frameworks like diffusion-based preference optimization in DiffRhythm+, and inference-time optimization techniques like Text2midi-InferAlign, we discuss how these techniques can address music's unique challenges: temporal coherence, harmonic consistency, and subjective quality assessment. We identify key research challenges including scalability to long-form compositions, reliability amongst others in preference modelling. Looking forward, we envision preference-aligned music generation enabling transformative applications in interactive composition tools and personalized music services. This work calls for sustained interdisciplinary research combining advances in machine learning, music-theory to create music AI systems that truly serve human creative and experiential needs.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.15038

Country: Asia (0.14)

Genre:

Research Report (0.64)
Overview > Innovation (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment

Wang, Zihan, Li, Jianwen, Wu, Li-Fan, Mahmoudian, Nina

arXiv.org Artificial IntelligenceNov-4-2025

Rivers are critical corridors for environmental monitoring and disaster response, where Unmanned Aerial Vehicles (UAVs) guided by vision-driven policies can provide fast, low-cost coverage. However, deployment exposes simulation-trained policies with distribution shift and safety risks and requires efficient adaptation from limited human interventions. We study human-in-the-loop (HITL) learning with a conservative overseer who vetoes unsafe or inefficient actions and provides statewise preferences by comparing the agent's proposal with a corrective override. We introduce Statewise Hybrid Preference Alignment for Robotics (SPAR-H), which fuses direct preference optimization on policy logits with a reward-based pathway that trains an immediate-reward estimator from the same preferences and updates the policy using a trust-region surrogate. With five HITL rollouts collected from a fixed novice policy, SPAR-H achieves the highest final episodic reward and the lowest variance across initial conditions among tested methods. The learned reward model aligns with human-preferred actions and elevates nearby non-intervened choices, supporting stable propagation of improvements. We benchmark SPAR-H against imitation learning (IL), direct preference variants, and evaluative reinforcement learning (RL) in the HITL setting, and demonstrate real-world feasibility of continual preference alignment for UAV river following. Overall, dual statewise preferences empirically provide a practical route to data-efficient online adaptation in riverine navigation.

intervention, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2511.01083

Country: North America > United States > Indiana (0.28)

Genre: Research Report (0.40)

Industry: Information Technology > Robotics & Automation (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.48)

Add feedback

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Luo, Yifu, Du, Penghui, Li, Bo, Du, Sinan, Zhang, Tiantian, Chang, Yongzhe, Wu, Kai, Gai, Kun, Wang, Xueqian

arXiv.org Artificial IntelligenceOct-27-2025

Group Relative Policy Optimization (GRPO) has shown strong potential for flow-matching-based text-to-image (T2I) generation, but it faces two key limitations: inaccurate advantage attribution, and the neglect of temporal dynamics of generation. In this work, we argue that shifting the optimization paradigm from the step level to the chunk level can effectively alleviate these issues. Building on this idea, we propose Chunk-GRPO, the first chunk-level GRPO-based approach for T2I generation. The insight is to group consecutive steps into coherent 'chunk's that capture the intrinsic temporal dynamics of flow matching, and to optimize policies at the chunk level. In addition, we introduce an optional weighted sampling strategy to further enhance performance. Extensive experiments show that ChunkGRPO achieves superior results in both preference alignment and image quality, highlighting the promise of chunk-level optimization for GRPO-based methods.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2510.21583

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization

Guo, Yuanhe, Xie, Linxi, Chen, Zhuoran, Yu, Kangrui, Po, Ryan, Yang, Guandao, Wetztein, Gordon, Wen, Hongyi

arXiv.org Artificial IntelligenceOct-22-2025

W e introduce ImageGem, a dataset for studying generative models that understand fine-grained individual preferences. W e posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and fine-grained user preference annotations. Our dataset features real-world interaction data from 57K users, who collectively have built 242K customized LoRAs, written 3M text prompts, and created 5M generated images. With user preference annotations from our dataset, we were able to train better preference alignment models. In addition, leveraging individual user preference, we investigated the performance of retrieval models and a vision-language model on personalized image retrieval and generative model recommendation. Finally, we propose an end-to-end framework for editing customized diffusion models in a latent weight space to align with individual user preferences. Our results demonstrate that the ImageGem dataset enables, for the first time, a new paradigm for generative model personalization.

artificial intelligence, dataset, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.18433

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

Preference Alignment with Flow Matching Minu Kim 1 Y ongsik Lee

Neural Information Processing SystemsOct-10-2025, 00:05:00 GMT

We provide theoretical insights that support our method's alignment with standard preference alignment objectives.

dataset, experiment, pfm, (16 more...)

Neural Information Processing Systems

Country:

Europe > Monaco (0.04)
Europe > Greece > Attica > Athens (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment

Lin, Liang, Xu, Zhihao, Dong, Junhao, Zhao, Jian, Yuan, Yuchen, Zhang, Guibin, Yu, Miao, Zhang, Yiming, Yao, Zhengtao, Yi, Huahui, Liu, Dongrui, Li, Xinfeng, Wang, Kun

arXiv.org Artificial IntelligenceOct-1-2025

Large language model (LLM) alignment faces a critical dilemma when addressing multiple human preferences: improvements in one dimension frequently come at the expense of others, creating unavoidable trade-offs between competing objectives like helpfulness and harmlessness. While prior work mainly focuses on constraint-based optimization algorithms and data selection strategies to mitigate conflicts, these approaches overlook the fundamental issue of resolving conflicts directly at the parameter level. In this paper, we present OrthAlign, an innovative approach that pioneers a new paradigm by leveraging orthogonal subspace decomposition to fundamentally resolve gradient-level conflicts in multi-objective preference alignment. OrthAlign strategically decomposes parameter update spaces into orthogonal subspaces, ensuring that optimization toward different preferences occurs in mathematically non-interfering directions. Building upon this, we provide theoretical guarantees demonstrating that when parameter increments satisfy both orthogonal subspace constraints and spectral norm bounds, the resulting updates exhibit linear Lipschitz growth rather than exponential instability, ensuring stable convergence across all preference dimensions. Extensive experiments show that: I. OrthAlign achieves maximum single-preference improvements ranging from 34.61% to 50.89% after multiple-objective alignment across helpful, harmless, and truthful dimensions. II. With an average overall reward improvement of 13.96%.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.2461

Country: Asia (0.29)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.68)
Law Enforcement & Public Safety (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Toward Preference-aligned Large Language Models via Residual-based Model Steering

La Cava, Lucio, Tagarelli, Andrea

arXiv.org Artificial IntelligenceSep-30-2025

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically require curated data and expensive optimization over billions of parameters, and eventually lead to persistent task-specific models. In this work, we introduce Preference alignment of Large Language Models via Residual Steering (PaLRS), a training-free method that exploits preference signals encoded in the residual streams of LLMs. From as few as one hundred preference pairs, PaLRS extracts lightweight, plug-and-play steering vectors that can be applied at inference time to push models toward preferred behaviors. We evaluate PaLRS on various small-to-medium-scale open-source LLMs, showing that PaLRS-aligned models achieve consistent gains on mathematical reasoning and code generation benchmarks while preserving baseline general-purpose performance. Moreover, when compared to DPO-aligned models, they perform better with huge time savings. Our findings highlight that PaLRS offers an effective, much more efficient and flexible alternative to standard preference optimization pipelines, offering a training-free, plug-and-play mechanism for alignment with minimal data.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.23982

Country:

Asia (0.68)
North America > United States (0.29)
Europe > Italy (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback