AITopics | user type

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Neural Information Processing SystemsJun-23-2026, 00:16:01 GMT

Effective conversational agents like large language models (LLMs) must personalize their interactions to adapt to user preferences, personalities, and attributes across diverse domains like education and healthcare. Current methods like Reinforcement Learning from Human Feedback (RLHF), often prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized dialogues. Existing personalization approaches typically rely on extensive user history, limiting their effectiveness for new or context-limited users. To address these limitations, we propose leveraging a user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF. This novel reward mechanism encourages the LLM agent to actively infer user traits by optimizing conversations to improve its user model's accuracy. Consequently, the agent delivers more personalized interactions by learning more about the user. We demonstrate our method's effectiveness in two distinct domains: significantly improving personalization performance in a conversational recommendation task, and personalizing conversations for different learning styles in an educational setting. We show improved generalization capabilities compared to traditional multi-turn RLHF, all while maintaining conversation quality. Our method offers a promising solution for creating more personalized, adaptive, and engaging conversational agents.

customer, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

The Burden of Interactive Alignment with Inconsistent Preferences

Neural Information Processing SystemsJun-22-2026, 01:36:18 GMT

From media platforms to chatbots, algorithms shape how people interact, learn, and discover information. Such interactions between users and an algorithm often unfold over multiple steps, during which strategic users can guide the algorithm to better align with their true interests by selectively engaging with content. However, users frequently exhibit inconsistent preferences: they may spend considerable time on content that offers little long-term value, inadvertently signaling that such content is desirable. Focusing on the user side, this raises a key question: what does it take for such users to align the algorithm with their true interests? To investigate these dynamics, we model the user's decision process as split between a rational "system 2" that decides whether to engage and an impulsive "system 1" that determines how long engagement lasts.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.94)
Information Technology > Human Computer Interaction (0.93)
(2 more...)

Add feedback

Direct Alignment with Heterogeneous Preferences

Neural Information Processing SystemsJun-17-2026, 22:10:18 GMT

Alignment with human preferences is commonly framed using a universal reward function, even though human preferences are inherently heterogeneous. We formalize this heterogeneity by introducing user types and examine the limits of the homogeneity assumption. We show that aligning to heterogeneous preferences with a single policy is best achieved using the average reward across user types. However, this requires additional information about annotators. We examine improvements under different information settings, focusing on direct alignment methods. We find that minimal information can yield first-order improvements, while full feedback from each user type leads to consistent learning of the optimal policy. Surprisingly, however, no sample-efficient consistent direct loss exists in this latter setting. These results reveal a fundamental tension between consistency and sample efficiency in direct policy alignment.

arxiv preprint arxiv, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Questionnaire & Opinion Survey (0.67)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

Hirota, Yusuke, Li, Boyi, Hachiuma, Ryo, Wu, Yueh-Hua, Ivanovic, Boris, Nakashima, Yuta, Pavone, Marco, Choi, Yejin, Wang, Yu-Chiang Frank, Yang, Chao-Han Huck

arXiv.org Artificial IntelligenceDec-2-2025

Large Vision-Language Models (LVLMs) have transformed image captioning, shifting from concise captions to detailed descriptions. We introduce LOTUS, a leaderboard for evaluating detailed captions, addressing three main gaps in existing evaluations: lack of standardized criteria, bias-aware assessments, and user preference considerations. LOTUS comprehensively evaluates various aspects, including caption quality (e.g., alignment, descriptiveness), risks (\eg, hallucination), and societal biases (e.g., gender bias) while enabling preference-oriented evaluations by tailoring criteria to diverse user preferences. Our analysis of recent LVLMs reveals no single model excels across all criteria, while correlations emerge between caption detail and bias risks. Preference-oriented evaluations demonstrate that optimal model selection depends on user priorities.

caption, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.acl-industry.22

2507.19362

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.82)

Add feedback

The Burden of Interactive Alignment with Inconsistent Preferences

Shirali, Ali

arXiv.org Artificial IntelligenceOct-21-2025

From media platforms to chatbots, algorithms shape how people interact, learn, and discover information. Such interactions between users and an algorithm often unfold over multiple steps, during which strategic users can guide the algorithm to better align with their true interests by selectively engaging with content. However, users frequently exhibit inconsistent preferences: they may spend considerable time on content that offers little long-term value, inadvertently signaling that such content is desirable. Focusing on the user side, this raises a key question: what does it take for such users to align the algorithm with their true interests? To investigate these dynamics, we model the user's decision process as split between a rational system 2 that decides whether to engage and an impulsive system 1 that determines how long engagement lasts. We then study a multi-leader, single-follower extensive Stackelberg game, where users, specifically system 2, lead by committing to engagement strategies and the algorithm best-responds based on observed interactions. We define the burden of alignment as the minimum horizon over which users must optimize to effectively steer the algorithm. We show that a critical horizon exists: users who are sufficiently foresighted can achieve alignment, while those who are not are instead aligned to the algorithm's objective. This critical horizon can be long, imposing a substantial burden. However, even a small, costly signal (e.g., an extra click) can significantly reduce it. Overall, our framework explains how users with inconsistent preferences can align an engagement-driven algorithm with their interests in a Stackelberg equilibrium, highlighting both the challenges and potential remedies for achieving alignment.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.16368

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction (0.93)
(2 more...)

Add feedback

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Wan, Yanming, Wu, Jiaxing, Abdulhai, Marwa, Shani, Lior, Jaques, Natasha

arXiv.org Artificial IntelligenceOct-3-2025

Effective conversational agents like large language models (LLMs) must personalize their interactions to adapt to user preferences, personalities, and attributes across diverse domains like education and healthcare. Current methods like Reinforcement Learning from Human Feedback (RLHF), often prioritize helpfulness and safety but fall short in fostering truly empathetic, adaptive, and personalized dialogues. Existing personalization approaches typically rely on extensive user history, limiting their effectiveness for new or context-limited users. To address these limitations, we propose leveraging a user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF. This novel reward mechanism encourages the LLM agent to actively infer user traits by optimizing conversations to improve its user model's accuracy. Consequently, the agent delivers more personalized interactions by learning more about the user. We demonstrate our method's effectiveness in two distinct domains: significantly improving personalization performance in a conversational recommendation task, and personalizing conversations for different learning styles in an educational setting. We show improved generalization capabilities compared to traditional multi-turn RLHF, all while maintaining conversation quality. Our method offers a promising solution for creating more personalized, adaptive, and engaging conversational agents.

customer, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.03206

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Education > Educational Setting (0.87)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Human-Robo-advisor collaboration in decision-making: Evidence from a multiphase mixed methods experimental study

Mahmud, Hasan, Islam, Najmul, Krishnan, Satish

arXiv.org Artificial IntelligenceOct-3-2025

Robo-advisors (RAs) are cost-effective, bias-resistant alternatives to human financial advisors, yet adoption remains limited. While prior research has examined user interactions with RAs, less is known about how individuals interpret RA roles and integrate their advice into decision-making. To address this gap, this study employs a multiphase mixed methods design integrating a behavioral experiment (N = 334), thematic analysis, and follow-up quantitative testing. Findings suggest that people tend to rely on RAs, with reliance shaped by information about RA performance and the framing of advice as gains or losses. Thematic analysis reveals three RA roles in decision-making and four user types, each reflecting distinct patterns of advice integration. In addition, a 2 x 2 typology categorizes antecedents of acceptance into enablers and inhibitors at both the individual and algorithmic levels. By combining behavioral, interpretive, and confirmatory evidence, this study advances understanding of human-RA collaboration and provides actionable insights for designing more trustworthy and adaptive RA systems.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.dss.2025.114541

2510.02153

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Financial Services (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.85)

Add feedback

From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system

Manderlier, Maxime, Lecron, Fabian, Thanh, Olivier Vu, Gillis, Nicolas

arXiv.org Artificial IntelligenceOct-2-2025

We investigate whether large language models (LLMs) can generate effective, user-facing explanations from a mathematically interpretable recommendation model. The model is based on constrained matrix factorization, where user types are explicitly represented and predicted item scores share the same scale as observed ratings, making the model's internal representations and predicted scores directly interpretable. This structure is translated into natural language explanations using carefully designed LLM prompts. Many works in explainable AI rely on automatic evaluation metrics, which often fail to capture users' actual needs and perceptions. In contrast, we adopt a user-centered approach: we conduct a study with 326 participants who assessed the quality of the explanations across five key dimensions-transparency, effectiveness, persuasion, trust, and satisfaction-as well as the recommendations themselves. To evaluate how different explanation strategies are perceived, we generate multiple explanation types from the same underlying model, varying the input information provided to the LLM. Our analysis reveals that all explanation types are generally well received, with moderate statistical differences between strategies. User comments further underscore how participants react to each type of explanation, offering complementary insights beyond the quantitative results.

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.1898

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Churn-Aware Recommendation Planning under Aggregated Preference Feedback

Keinan, Gur, Ben-Porat, Omer

arXiv.org Artificial IntelligenceJul-8-2025

We study a sequential decision-making problem motivated by recent regulatory and technological shifts that limit access to individual user data in recommender systems (RSs), leaving only population-level preference information. This privacy-aware setting poses fundamental challenges in planning under uncertainty: Effective personalization requires exploration to infer user preferences, yet unsatisfactory recommendations risk immediate user churn. To address this, we introduce the Rec-APC model, in which an anonymous user is drawn from a known prior over latent user types (e.g., personas or clusters), and the decision-maker sequentially selects items to recommend. Feedback is binary -- positive responses refine the posterior via Bayesian updates, while negative responses result in the termination of the session. We prove that optimal policies converge to pure exploitation in finite time and propose a branch-and-bound algorithm to efficiently compute them. Experiments on synthetic and MovieLens data confirm rapid convergence and demonstrate that our method outperforms the POMDP solver SARSOP, particularly when the number of user types is large or comparable to the number of content categories. Our results highlight the applicability of this approach and inspire new ways to improve decision-making under the constraints imposed by aggregated preference data.

artificial intelligence, machine learning, user type, (19 more...)

arXiv.org Artificial Intelligence

2507.04513

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > California (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Add feedback

Multi-User Beamforming with Deep Reinforcement Learning in Sensing-Aided Communication

Wang, Xiyu, Berardinelli, Gilberto, Cheng, Hei Victor, Popovski, Petar, Adeogun, Ramoni

arXiv.org Artificial IntelligenceMay-12-2025

Mobile users are prone to experience beam failure due to beam drifting in millimeter wave (mmWave) communications. Sensing can help alleviate beam drifting with timely beam changes and low overhead since it does not need user feedback. This work studies the problem of optimizing sensing-aided communication by dynamically managing beams allocated to mobile users. A multi-beam scheme is introduced, which allocates multiple beams to the users that need an update on the angle of departure (AoD) estimates and a single beam to the users that have satisfied AoD estimation precision. A deep reinforcement learning (DRL) assisted method is developed to optimize the beam allocation policy, relying only upon the sensing echoes. For comparison, a heuristic AoD-based method using approximated Cramér-Rao lower bound (CRLB) for allocation is also presented. Both methods require neither user feedback nor prior state evolution information. Results show that the DRL-assisted method achieves a considerable gain in throughput than the conventional beam sweeping method and the AoD-based method, and it is robust to different user speeds.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2505.05956

Country:

Europe > Italy > Lazio > Rome (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (0.84)

Industry:

Telecommunications (0.54)
Information Technology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

user type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

The Burden of Interactive Alignment with Inconsistent Preferences

Direct Alignment with Heterogeneous Preferences

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

The Burden of Interactive Alignment with Inconsistent Preferences

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

Human-Robo-advisor collaboration in decision-making: Evidence from a multiphase mixed methods experimental study

From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system

Churn-Aware Recommendation Planning under Aggregated Preference Feedback

Multi-User Beamforming with Deep Reinforcement Learning in Sensing-Aided Communication