AITopics | interaction history

Collaborating Authors

interaction history

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Long-Context Reasoning in LLM-Based WebAgents

Chung, Andy, Zhang, Yichi, Lin, Kaixiang, Rawal, Aditya, Gao, Qiaozi, Chai, Joyce

arXiv.org Artificial IntelligenceDec-5-2025

As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. We develop a novel evaluation framework that simulates multi-session user interactions by injecting irrelevant task trajectories between dependent subtasks, creating contexts ranging from 25,000 to 150,000 tokens. Through extensive evaluation of four popular models, Claude-3.7, GPT-4.1, Llama 4, and o4-mini, we observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50\% in baseline conditions to less than 10\% in long context scenarios. Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives. We further propose an implicit RAG approach that provides modest improvements by generating task-relevant summaries, though fundamental limitations in long context reasoning persist. These findings highlight critical challenges for deploying WebAgents in realistic, long-term user interaction scenarios and provide insights for developing more robust agent architectures capable of maintaining coherent task execution across extended contexts.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.04307

Country: North America > United States > California (0.46)

Genre:

Workflow (0.93)
Research Report > New Finding (0.93)

Industry:

Media (1.00)
Consumer Products & Services (1.00)
Transportation (0.93)
Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction

Huang, Zhaopei, Dai, Qifeng, Wu, Guozheng, Wu, Xiaopeng, Chen, Kehan, Yu, Chuan, Li, Xubin, Ge, Tiezheng, Wang, Wenxuan, Jin, Qin

arXiv.org Artificial IntelligenceNov-27-2025

With the rise of smart personal devices, service-oriented human-agent interactions have become increasingly prevalent. This trend highlights the need for personalized dialogue assistants that can understand user-specific traits to accurately interpret requirements and tailor responses to individual preferences. However, existing approaches often overlook the complexities of long-term interactions and fail to capture users' subjective characteristics. To address these gaps, we present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions. In the absence of available real-world data, we develop a multi-step LLM-based synthesis pipeline, which is further verified and refined by human annotators. This process yields PAL-Set, the first Chinese dataset comprising multi-session user logs and dialogue histories, which serves as the foundation for PAL-Bench. Furthermore, to improve personalized service-oriented interactions, we propose H$^2$Memory, a hierarchical and heterogeneous memory framework that incorporates retrieval-augmented generation to improve personalized response generation. Comprehensive experiments on both our PAL-Bench and an external dataset demonstrate the effectiveness of the proposed memory framework.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.1341

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

GMTRouter: Personalized LLM Router over Multi-turn User Interactions

Xie, Encheng, Sun, Yihang, Feng, Tao, You, Jiaxuan

arXiv.org Artificial IntelligenceNov-13-2025

Large Language Model (LLM) routing has demonstrated strong capability in balancing response quality with computational cost. As users exhibit diverse preferences, personalization has attracted increasing attention in LLM routing, since even identical queries may require different models to generate responses tailored to individual needs. However, existing approaches are not fully personalized and often fail to capture the complex interactions between specific users and LLMs. Moreover, user preference data is typically scarce, noisy, and inconsistent in format, which limits the effectiveness of methods that rely solely on user-specific data. To address these challenges, we propose GMTRouter, which represents multi-turn user-LLM interactions as a heterogeneous graph with four node types: user, LLM, query, and response, thereby preserving the rich relational structure of the interaction. Through a tailored message-passing mechanism, GMTRouter learns to capture user preferences from few-shot data within a lightweight inductive graph learning framework, enabling effective personalization. Extensive experiments demonstrate that GMTRouter consistently outperforms strong baselines, achieving 0.9 to 21.6 percent higher accuracy and 0.006 to 0.309 higher AUC across multiple datasets. More importantly, we demonstrate that GMTRouter can adapt to new users and evolving preferences using only few-shot data, without extensive fine-tuning. The code for GMTRouter is publicly available at https://github.com/ulab-uiuc/GMTRouter.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.0859

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Jiang, Bowen, Hao, Zhuoqun, Cho, Young-Min, Li, Bryan, Yuan, Yuan, Chen, Sihao, Ungar, Lyle, Taylor, Camillo J., Roth, Dan

arXiv.org Artificial IntelligenceOct-28-2025

Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks -- from offering writing support to delivering tailored recommendations or consultations. Over time, the interaction history between a user and an LLM can provide extensive information about an individual's traits and preferences. However, open questions remain on how well LLMs today can effectively leverage such history to (1) internalize the user's inherent traits and preferences, (2) track how the user profiling and preferences evolve over time, and (3) generate personalized responses accordingly in new scenarios. In this work, we introduce the PERSONAMEM benchmark. PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories, each containing up to 60 sessions of multi-turn conversations across 15 real-world tasks that require personalization. Given an in-situ user query, i.e. query issued by the user from the first-person perspective, we evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile. We observe that current LLMs still struggle to recognize the dynamic evolution in users' profiles over time through direct prompting approaches. As a consequence, LLMs often fail to deliver responses that align with users' current situations and preferences, with frontier models such as GPT-4.1, o4-mini, GPT-4.5, o1, or Gemini-2.0 achieving only around 50% overall accuracy, suggesting room for improvement. We hope that PERSONAMEM, along with the user profile and conversation simulation pipeline, can facilitate future research in the development of truly user-aware chatbots. Code and data are available at github.com/bowen-upenn/PersonaMem.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.14225

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TAMI: Taming Heterogeneity in Temporal Interactions for Temporal Graph Link Prediction

Yu, Zhongyi, Wu, Jianqiu, Wu, Zhenghao, Zhong, Shuhan, Su, Weifeng, Lee, Chul-Ho, Zhuo, Weipeng

arXiv.org Artificial IntelligenceOct-28-2025

Temporal graph link prediction aims to predict future interactions between nodes in a graph based on their historical interactions, which are encoded in node embeddings. We observe that heterogeneity naturally appears in temporal interactions, e.g., a few node pairs can make most interaction events, and interaction events happen at varying intervals. This leads to the problems of ineffective temporal information encoding and forgetting of past interactions for a pair of nodes that interact intermittently for their link prediction. Existing methods, however, do not consider such heterogeneity in their learning process, and thus their learned temporal node embeddings are less effective, especially when predicting the links for infrequently interacting node pairs. To cope with the heterogeneity, we propose a novel framework called TAMI, which contains two effective components, namely log time encoding function (LTE) and link history aggregation (LHA). LTE better encodes the temporal information through transforming interaction intervals into more balanced ones, and LHA prevents the historical interactions for each target node pair from being forgotten. State-of-the-art temporal graph neural networks can be seamlessly and readily integrated into TAMI to improve their effectiveness. Experiment results on 13 classic datasets and three newest temporal graph benchmark (TGB) datasets show that TAMI consistently improves the link prediction performance of the underlying models in both transductive and inductive settings. Our code is available at https://github.com/Alleinx/TAMI_temporal_graph.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.23577

Country: Asia > China (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (0.92)
Information Technology (0.67)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Enhance Large Language Models as Recommendation Systems with Collaborative Filtering

Yang, Zhisheng, Xu, Xiaofei, Deng, Ke, Li, Li

arXiv.org Artificial IntelligenceOct-20-2025

As powerful tools in Natural Language Processing (NLP), Large Language Models (LLMs) have been leveraged for crafting recommendations to achieve precise alignment with user preferences and elevate the quality of the recommendations. The existing approaches implement both non-tuning and tuning strategies. Compared to following the tuning strategy, the approaches following the non-tuning strategy avoid the relatively costly, time-consuming, and expertise-requiring process of further training pre-trained LLMs on task-specific datasets, but they suffer the issue of not having the task-specific business or local enterprise knowledge. To the best of our knowledge, none of the existing approaches following the non-tuning strategy explicitly integrates collaborative filtering, one of the most successful recommendation techniques. This study aims to fill the gap by proposing critique-based LLMs as recommendation systems (Critic-LLM-RS). For our purpose, we train a separate machine-learning model called Critic that implements collaborative filtering for recommendations by learning from the interactions between many users and items. The Critic provides critiques to LLMs to significantly refine the recommendations. Extensive experiments have verified the effectiveness of Critic-LLM-RS on real datasets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.15647

Genre: Research Report (0.82)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Chen, Hanyang, Zhao, Mark, Yang, Rui, Ma, Qinwei, Yang, Ke, Yao, Jiarui, Wang, Kangrui, Bai, Hao, Wang, Zhenhailong, Pan, Rui, Zhang, Mengchao, Barreiros, Jose, Onol, Aykut, Zhai, ChengXiang, Ji, Heng, Li, Manling, Zhang, Huan, Zhang, Tong

arXiv.org Artificial IntelligenceOct-15-2025

Recent advances in embodied AI highlight the potential of vision language models (VLMs) as agents capable of perception, reasoning, and interaction in complex environments. However, top-performing systems rely on large-scale models that are costly to deploy, while smaller VLMs lack the necessary knowledge and skills to succeed. To bridge this gap, we present \textit{Embodied Reasoning Agent (ERA)}, a two-stage framework that integrates prior knowledge learning and online reinforcement learning (RL). The first stage, \textit{Embodied Prior Learning}, distills foundational knowledge from three types of data: (1) Trajectory-Augmented Priors, which enrich existing trajectory data with structured reasoning generated by stronger models; (2) Environment-Anchored Priors, which provide in-environment knowledge and grounding supervision; and (3) External Knowledge Priors, which transfer general knowledge from out-of-environment datasets. In the second stage, we develop an online RL pipeline that builds on these priors to further enhance agent performance. To overcome the inherent challenges in agent RL, including long horizons, sparse rewards, and training instability, we introduce three key designs: self-summarization for context management, dense reward shaping, and turn-level policy optimization. Extensive experiments on both high-level planning (EB-ALFRED) and low-level control (EB-Manipulation) tasks demonstrate that ERA-3B surpasses both prompting-based large models and previous training-based baselines. Specifically, it achieves overall improvements of 8.4\% on EB-ALFRED and 19.4\% on EB-Manipulation over GPT-4o, and exhibits strong generalization to unseen tasks. Overall, ERA offers a practical path toward scalable embodied intelligence, providing methodological insights for future embodied AI systems.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.12693

Genre:

Research Report > New Finding (1.00)
Instructional Material > Online (0.62)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

Wang, Maggie, Tian, Stephen, Swann, Aiden, Shorinwa, Ola, Wu, Jiajun, Schwager, Mac

arXiv.org Artificial IntelligenceOct-14-2025

Phys2Real is a real-to-sim-to-real pipeline for robotic manipulation that combines VLM-based physical parameter estimation with interaction-based adaptation through uncertainty-aware fusion. It comprises three stages: (I) real-to-sim: object reconstruction from segmented Gaussian Splats into simulation-ready meshes, (II) policy learning: reinforcement learning of policies conditioned on physical parameters such as the center of mass (CoM) of an object, and (III) sim-to-real transfer: uncertainty-aware fusion of VLM priors and interaction-based estimates for online adaptation. Abstract-- Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. T o address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. On planar pushing tasks of a T - block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T -block, 57% vs 23% in the challenging top-weighted T -block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Deploying robotic manipulation policies trained in simulation to the real world remains a fundamental challenge, especially for tasks requiring fine-grained physical dynamics. Robots must adapt to varying object properties such as friction, mass distribution, and compliance, which significantly affect manipulation outcomes but are difficult to model precisely. While learning from demonstrations has shown significant promise, it often lacks the physical grounding and reasoning needed to adapt to novel objects.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.11689

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Add feedback

Temporal User Profiling with LLMs: Balancing Short-Term and Long-Term Preferences for Recommendations

Sabouri, Milad, Mansoury, Masoud, Lin, Kun, Mobasher, Bamshad

arXiv.org Artificial IntelligenceAug-13-2025

Accurately modeling user preferences is crucial for improving the performance of content-based recommender systems. Existing approaches often rely on simplistic user profiling methods, such as averaging or concatenating item embeddings, which fail to capture the nuanced nature of user preference dynamics, particularly the interactions between long-term and short-term preferences. In this work, we propose LLM-driven Temporal User Profiling (LLM-TUP), a novel method for user profiling that explicitly models short-term and long-term preferences by leveraging interaction timestamps and generating natural language representations of user histories using a large language model (LLM). These representations are encoded into high-dimensional embeddings using a pre-trained BERT model, and an attention mechanism is applied to dynamically fuse the short-term and long-term embeddings into a comprehensive user profile. Experimental results on real-world datasets demonstrate that LLM-TUP achieves substantial improvements over several baselines, underscoring the effectiveness of our temporally aware user-profiling approach and the use of semantically rich user profiles, generated by LLMs, for personalized content-based recommendation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.08454

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LettinGo: Explore User Profile Generation for Recommendation System

Wang, Lu, Zhang, Di, Yang, Fangkai, Zhao, Pu, Liu, Jianfeng, Zhan, Yuefeng, Sun, Hao, Lin, Qingwei, Deng, Weiwei, Zhang, Dongmei, Sun, Feng, Zhang, Qi

arXiv.org Artificial IntelligenceJun-24-2025

User profiling is pivotal for recommendation systems, as it transforms raw user interaction data into concise and structured representations that drive personalized recommendations. While traditional embedding-based profiles lack interpretability and adaptability, recent advances with large language models (LLMs) enable text-based profiles that are semantically richer and more transparent. However, existing methods often adhere to fixed formats that limit their ability to capture the full diversity of user behaviors. In this paper, we introduce LettinGo, a novel framework for generating diverse and adaptive user profiles. By leveraging the expressive power of LLMs and incorporating direct feedback from downstream recommendation tasks, our approach avoids the rigid constraints imposed by supervised fine-tuning (SFT). Instead, we employ Direct Preference Optimization (DPO) to align the profile generator with task-specific performance, ensuring that the profiles remain adaptive and effective. LettinGo operates in three stages: (1) exploring diverse user profiles via multiple LLMs, (2) evaluating profile quality based on their impact in recommendation systems, and (3) aligning the profile generation through pairwise preference data derived from task performance. Experimental results demonstrate that our framework significantly enhances recommendation accuracy, flexibility, and contextual awareness. This work enhances profile generation as a key innovation for next-generation recommendation systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.18309

Country:

North America > Canada (0.16)
Asia > China (0.16)

Genre:

Research Report > New Finding (0.88)
Research Report > Promising Solution (0.67)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback