Goto

Collaborating Authors

 preference intensity


What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

arXiv.org Artificial Intelligence

Sequential recommendation systems aspire to profile users by interpreting their interaction histories, echoing how humans make decisions by weighing experience, relative preference strength, and situational relevance. Yet, existing large language model (LLM)-based recommenders often fall short of mimicking the flexible, context-aware decision strategies humans exhibit, neglecting the structured, dynamic, and context-aware mechanisms fundamental to human behaviors. To bridge this gap, we propose RecPO, a preference optimization framework that models structured feedback and contextual delay to emulate human-like prioritization in sequential recommendation. RecPO exploits adaptive reward margins based on inferred preference hierarchies and temporal signals, enabling the model to favor immediately relevant items and to distinguish between varying degrees of preference and aversion. Extensive experiments across five real-world datasets demonstrate that RecPO not only yields performance gains over state-of-the-art baselines, but also mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.


Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models

arXiv.org Artificial Intelligence

Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. T o address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.


MOSLIM:Align with diverse preferences in prompts through reward classification

arXiv.org Artificial Intelligence

The multi-objective alignment of Large Language Models (LLMs) is essential for ensuring foundational models conform to diverse human preferences. Current research in this field typically involves either multiple policies or multiple reward models customized for various preferences, or the need to train a preference-specific supervised fine-tuning (SFT) model. In this work, we introduce a novel multi-objective alignment method, MOSLIM, which utilizes a single reward model and policy model to address diverse objectives. MOSLIM provides a flexible way to control these objectives through prompting and does not require preference training during SFT phase, allowing thousands of off-the-shelf models to be directly utilized within this training framework. MOSLIM leverages a multi-head reward model that classifies question-answer pairs instead of scoring them and then optimize policy model with a scalar reward derived from a mapping function that converts classification results from reward model into reward scores. We demonstrate the efficacy of our proposed method across several multi-objective benchmarks and conduct ablation studies on various reward model sizes and policy optimization methods. The MOSLIM method outperforms current multi-objective approaches in most results while requiring significantly fewer GPU computing resources compared with existing policy optimization methods. While large language models (LLMs) have been widely adopted across various domains, generating text that aligns with human preferences has become a prominent area of research. Stiennon et al. (2020) introduced the concept of learning from human feedback to better align model behavior with human preferences, specifically aiming to produce summaries that are more preferred by human annotators.


Improving One-class Recommendation with Multi-tasking on Various Preference Intensities

arXiv.org Artificial Intelligence

In general, implicit feedback is easier to obtain than explicit feedback. Thus, making recommendations with only implicit feedback is indispensable. This type of problems are referred to as one-class recommendation [6]. There are several efforts proposed to solve one-class recommendation problems. For example, model-based methods [2, 7] aim to learn a vector representation for each user and item and apply some kernel, such as inner product for matrix factorization (MF) [5], to measure similarity. On the other hand, graph-based methods [11] construct a user-item bipartite graph from historical interactions and utilize random walk on it to explore user interests and make recommendations. In recent years, hybrid approaches [4, 10] combining model-based and graph-based methods have been developed. They explore high-order relationships on the bipartite graph and encode this information into learned entity representations, resulting in remarkable improvements in one-class recommendation tasks. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.