Personal Assistant Systems
A Comprehensive Survey on Self-Supervised Learning for Recommendation
Ren, Xubin, Wei, Wei, Xia, Lianghao, Huang, Chao
Recommender systems play a crucial role in tackling the challenge of information overload by delivering personalized recommendations based on individual user preferences. Deep learning techniques, such as RNNs, GNNs, and Transformer architectures, have significantly propelled the advancement of recommender systems by enhancing their comprehension of user behaviors and preferences. However, supervised learning methods encounter challenges in real-life scenarios due to data sparsity, resulting in limitations in their ability to learn representations effectively. To address this, self-supervised learning (SSL) techniques have emerged as a solution, leveraging inherent data structures to generate supervision signals without relying solely on labeled data. By leveraging unlabeled data and extracting meaningful representations, recommender systems utilizing SSL can make accurate predictions and recommendations even when confronted with data sparsity. In this paper, we provide a comprehensive review of self-supervised learning frameworks designed for recommender systems, encompassing a thorough analysis of over 170 papers. We conduct an exploration of nine distinct scenarios, enabling a comprehensive understanding of SSL-enhanced recommenders in different contexts. For each domain, we elaborate on different self-supervised learning paradigms, namely contrastive learning, generative learning, and adversarial learning, so as to present technical details of how SSL enhances recommender systems in various contexts. We consistently maintain the related open-source materials at https://github.com/HKUDS/Awesome-SSLRec-Papers.
PMG : Personalized Multimodal Generation with Large Language Models
Shen, Xiaoteng, Zhang, Rui, Zhao, Xiaoyan, Zhu, Jieming, Xiao, Xi
The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation using LLMs, showcases its applications and validates its performance via an extensive experimental study on two datasets. The proposed method, Personalized Multimodal Generation (PMG for short) first converts user behaviors (e.g., clicks in recommender systems or conversations with a virtual assistant) into natural language to facilitate LLM understanding and extract user preference descriptions. Such user preferences are then fed into a generator, such as a multimodal LLM or diffusion model, to produce personalized content. To capture user preferences comprehensively and accurately, we propose to let the LLM output a combination of explicit keywords and implicit embeddings to represent user preferences. Then the combination of keywords and embeddings are used as prompts to condition the generator. We optimize a weighted sum of the accuracy and preference scores so that the generated content has a good balance between them. Compared to a baseline method without personalization, PMG has a significant improvement on personalization for up to 8% in terms of LPIPS while retaining the accuracy of generation.
A Survey of Route Recommendations: Methods, Applications, and Opportunities
Zhang, Shiming, Luo, Zhipeng, Yang, Li, Teng, Fei, Li, Tianrui
Nowadays, with advanced information technologies deployed citywide, large data volumes and powerful computational resources are intelligentizing modern city development. As an important part of intelligent transportation, route recommendation and its applications are widely used, directly influencing citizens` travel habits. Developing smart and efficient travel routes based on big data (possibly multi-modal) has become a central challenge in route recommendation research. Our survey offers a comprehensive review of route recommendation work based on urban computing. It is organized by the following three parts: 1) Methodology-wise. We categorize a large volume of traditional machine learning and modern deep learning methods. Also, we discuss their historical relations and reveal the edge-cutting progress. 2) Application\-wise. We present numerous novel applications related to route commendation within urban computing scenarios. 3) We discuss current problems and challenges and envision several promising research directions. We believe that this survey can help relevant researchers quickly familiarize themselves with the current state of route recommendation research and then direct them to future research trends.
Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?
Ilyankou, Ilya, Lipani, Aldo, Cavazzi, Stefano, Gao, Xiaowei, Haworth, James
Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers, fine-tuned on general question-answering datasets for asymmetric semantic search, to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences. We find that sentence transformers have some zero-shot capabilities to understand quasi-geospatial concepts, such as route types and difficulty, suggesting their potential utility for routing recommendation systems.
EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention
Tian, Zhen, Zhao, Wayne Xin, Zhang, Changwang, Zhao, Xin, Ma, Zhongrui, Wen, Ji-Rong
To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairwise attention scores can be derived by both semantic difference and positional difference. However, prior studies often model the two kinds of difference measurements in different ways, which potentially limits the expressive capacity of sequence modeling. To address this issue, this paper proposes a novel transformer variant with complex vector attention, named EulerFormer, which provides a unified theoretical framework to formulate both semantic difference and positional difference. The EulerFormer involves two key technical improvements. First, it employs a new transformation function for efficiently transforming the sequence tokens into polar-form complex vectors using Euler's formula, enabling the unified modeling of both semantic and positional information in a complex rotation form.Secondly, it develops a differential rotation mechanism, where the semantic rotation angles can be controlled by an adaptation function, enabling the adaptive integration of the semantic and positional information according to the semantic contexts.Furthermore, a phase contrastive learning task is proposed to improve the isotropy of contextual representations in EulerFormer. Our theoretical framework possesses a high degree of completeness and generality. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle. Extensive experiments conducted on four public datasets demonstrate the effectiveness and efficiency of our approach.
Navigating the new era of commerce: Exploring the relationship between anthropomorphism in voice assistants and user safety perception
Image created by Guillermo Calahorra-Candao using ChatGPT. Prompt: "create an image of a person conversing with a virtual voice assistant (just like Alexa, Siri, or Google), while simultaneously wondering if the assistant might actually be human". In an era where technology continuously reshapes our daily interactions, the rise of virtual voice assistants (VAs) like Alexa, Google Home, and Siri represents a significant leap. Originally designed to enhance smartphone usability, these VAs have transcended their initial purpose, finding their way into various consumer devices and altering the user experience landscape. However, despite their widespread integration, a notable reluctance in adopting voice shopping persists, primarily due to concerns regarding safety.
Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender Systems
Schultzberg, Claire, Ottens, Brammert
Over the last decades has emerged a rich literature on the evaluation of recommendation systems. However, less is written about how to efficiently combine different evaluation methods from this rich field into a single efficient evaluation funnel. In this paper we aim to build intuition for how to choose evaluation methods, by presenting a novel framework that simplifies the reasoning around the evaluation funnel for a recommendation system. Our contribution is twofold. First we present our framework for how to decompose the definition of success to construct efficient evaluation funnels, focusing on how to identify and discard non-successful iterations quickly. We show that decomposing the definition of success into smaller necessary criteria for success enables early identification of non-successful ideas. Second, we give an overview of the most common and useful evaluation methods, discuss their pros and cons, and how they fit into, and complement each other in, the evaluation process. We go through so-called offline and online evaluation methods such as counterfactual logging, validation, verification, A/B testing, and interleaving. The paper concludes with some general discussion and advice on how to design an efficient evaluation process for recommender systems.
A Recommender System for NFT Collectibles with Item Feature
Choi, Minjoo, Kim, Seonmi, Kim, Yejin, Lee, Youngbin, Hong, Joohwan, Lee, Yongjae
Recommender systems have been actively studied and applied in various domains to deal with information overload. Although there are numerous studies on recommender systems for movies, music, and e-commerce, comparatively less attention has been paid to the recommender system for NFTs despite the continuous growth of the NFT market. This paper presents a recommender system for NFTs that utilizes a variety of data sources, from NFT transaction records to external item features, to generate precise recommendations that cater to individual preferences. We develop a data-efficient graph-based recommender system to efficiently capture the complex relationship between each item and users and generate node(item) embeddings which incorporate both node feature information and graph structure. Furthermore, we exploit inputs beyond user-item interactions, such as image feature, text feature, and price feature. Numerical experiments verify the performance of the graph-based recommender system improves significantly after utilizing all types of item features as side information, thereby outperforming all other baselines.
Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation
Wen, Zhiyuan, Cao, Jiannong, Yang, Yu, Yang, Ruosong, Liu, Shuaiqi
Personality Recognition in Conversation (PRC) aims to identify the personality traits of speakers through textual dialogue content. It is essential for providing personalized services in various applications of Human-Computer Interaction (HCI), such as AI-based mental therapy and companion robots for the elderly. Most recent studies analyze the dialog content for personality classification yet overlook two major concerns that hinder their performance. First, crucial implicit factors contained in conversation, such as emotions that reflect the speakers' personalities are ignored. Second, only focusing on the input dialog content disregards the semantic understanding of personality itself, which reduces the interpretability of the results. In this paper, we propose Affective Natural Language Inference (Affective-NLI) for accurate and interpretable PRC. To utilize affectivity within dialog content for accurate personality recognition, we fine-tuned a pre-trained language model specifically for emotion recognition in conversations, facilitating real-time affective annotations for utterances. For interpretability of recognition results, we formulate personality recognition as an NLI problem by determining whether the textual description of personality labels is entailed by the dialog content. Extensive experiments on two daily conversation datasets suggest that Affective-NLI significantly outperforms (by 6%-7%) state-of-the-art approaches. Additionally, our Flow experiment demonstrates that Affective-NLI can accurately recognize the speaker's personality in the early stages of conversations by surpassing state-of-the-art methods with 22%-34%.