Personal Assistant Systems
ComPO: Community Preferences for Language Model Personalization
Kumar, Sachin, Park, Chan Young, Tsvetkov, Yulia, Smith, Noah A., Hajishirzi, Hannaneh
Conventional algorithms for training language models (LMs) with human feedback rely on preferences that are assumed to account for an "average" user, disregarding subjectivity and finer-grained variations. Recent studies have raised concerns that aggregating such diverse and often contradictory human feedback to finetune models results in generic models that generate outputs not preferred by many user groups, as they tend to average out styles and norms. To address this issue, we draw inspiration from recommendation systems and propose ComPO, a method to personalize preference optimization in LMs by contextualizing the probability distribution of model outputs with the preference provider. Focusing on group-level preferences rather than individuals, we collect and release ComPRed, a question answering dataset with community-level preferences from Reddit. This dataset facilitates studying diversity in preferences without incurring privacy concerns associated with individual feedback. Our experiments reveal that conditioning language models on a community identifier (i.e., subreddit name) during preference tuning substantially enhances model performance. Conversely, replacing this context with random subreddit identifiers significantly diminishes performance, highlighting the effectiveness of our approach in tailoring responses to communities' preferences.
Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
Inoue, Koji, Lala, Divesh, Skantze, Gabriel, Kawahara, Tatsuya
In human conversations, short backchannel utterances such as "yeah" and "oh" play a crucial role in facilitating smooth and engaging dialogue. These backchannels signal attentiveness and understanding without interrupting the speaker, making their accurate prediction essential for creating more natural conversational agents. This paper proposes a novel method for real-time, continuous backchannel prediction using a fine-tuned Voice Activity Projection (VAP) model. While existing approaches have relied on turn-based or artificially balanced datasets, our approach predicts both the timing and type of backchannels in a continuous and frame-wise manner on unbalanced, real-world datasets. We first pre-train the VAP model on a general dialogue corpus to capture conversational dynamics and then fine-tune it on a specialized dataset focused on backchannel behavior. Experimental results demonstrate that our model outperforms baseline methods in both timing and type prediction tasks, achieving robust performance in real-time environments. This research offers a promising step toward more responsive and human-like dialogue systems, with implications for interactive spoken dialogue applications such as virtual assistants and robots.
LightFusionRec: Lightweight Transformers-Based Cross-Domain Recommendation Model
Kharidia, Vansh, Paprunia, Dhruvi, Kanikar, Prashasti
This paper presents LightFusionRec, a novel lightweight cross-domain recommendation system that integrates DistilBERT for textual feature extraction and FastText for genre embedding. Important issues in recommendation systems, such as data sparsity, computational efficiency, and cold start issues, are addressed in methodology. LightFusionRec uses a small amount of information to produce precise and contextually relevant recommendations for many media formats by fusing genre vector embedding with natural language processing algorithms. Tests conducted on extensive movie and book datasets show notable enhancements in suggestion quality when compared to conventional methods. Because of its lightweight design, the model can be used for a variety of purposes and allows for ondevice inference. LightFusionRec is a noteworthy development in cross-domain recommendation systems, providing accurate and scalable recommendations to improve user experience on digital content platforms.
Romance scams on the rise as Americans look to dating apps for love: 5 tips to protect yourself
After losing her husband, "Beatrice" turned to an online dating site for seniors during the COVID-19 pandemic. She quickly matched with and fell hard for a person she thought was a 66-year-old Spanish lumberjack who looked uncannily like her husband. "I was missing not having him here to talk about, you know, what was going on in the world and everything," Beatrice, who asked that her real name not be used, told Homeland Security Investigations (HSI). "So, somebody suggested to go online through a dating serviceโฆ and this guy's pictures show up and he's just, you know, no George Clooney, nothing gorgeous, but in fact, he had a resemblance to my husband." The man spent about four months texting and calling the woman before he felt he had gained her trust โ then, he began asking her to wire him money.
Performance-Driven QUBO for Recommender Systems on Quantum Annealers
Niu, Jiayang, Li, Jie, Deng, Ke, Sanderson, Mark, Ren, Yongli
We propose Counterfactual Analysis Quadratic Unconstrained Binary Optimization (CAQUBO) to solve QUBO problems for feature selection in recommender systems. CAQUBO leverages counterfactual analysis to measure the impact of individual features and feature combinations on model performance and employs the measurements to construct the coefficient matrix for a quantum annealer to select the optimal feature combinations for recommender systems, thereby improving their final recommendation performance. By establishing explicit connections between features and the recommendation performance, the proposed approach demonstrates superior performance compared to the state-of-the-art quantum annealing methods. Extensive experiments indicate that integrating quantum computing with counterfactual analysis holds great promise for addressing these challenges.
Dreaming User Multimodal Representation Guided by The Platonic Representation Hypothesis for Micro-Video Recommendation
Lin, Chengzhi, Lin, Hezheng, Liu, Shuchang, Ruan, Cangguang, Xu, LingJing, Yang, Dezhao, Wang, Chuyuan, Liu, Yongqi
The proliferation of online micro-video platforms has underscored the necessity for advanced recommender systems to mitigate information overload and deliver tailored content. Despite advancements, accurately and promptly capturing dynamic user interests remains a formidable challenge. Inspired by the Platonic Representation Hypothesis, which posits that different data modalities converge towards a shared statistical model of reality, we introduce DreamUMM (Dreaming User Multi-Modal Representation), a novel approach leveraging user historical behaviors to create real-time user representation in a multimoda space. DreamUMM employs a closed-form solution correlating user video preferences with multimodal similarity, hypothesizing that user interests can be effectively represented in a unified multimodal space. Additionally, we propose Candidate-DreamUMM for scenarios lacking recent user behavior data, inferring interests from candidate videos alone. Extensive online A/B tests demonstrate significant improvements in user engagement metrics, including active days and play count. The successful deployment of DreamUMM in two micro-video platforms with hundreds of millions of daily active users, illustrates its practical efficacy and scalability in personalized micro-video content delivery. Our work contributes to the ongoing exploration of representational convergence by providing empirical evidence supporting the potential for user interest representations to reside in a multimodal space.
A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining
Liu, Wenyi, Wang, Rui, Luo, Yuanshuai, Wei, Jianjun, Zhao, Zihao, Huang, Junming
With the explosive growth of Internet data, users are facing the problem of information overload, which makes it a challenge to efficiently obtain the required resources. Recommendation systems have emerged in this context. By filtering massive amounts of information, they provide users with content that meets their needs, playing a key role in scenarios such as advertising recommendation and product recommendation. However, traditional click-through rate prediction and TOP-K recommendation mechanisms are gradually unable to meet the recommendations needs in modern life scenarios due to high computational complexity, large memory consumption, long feature selection time, and insufficient feature interaction. This paper proposes a recommendations system model based on a separation embedding cross-network. The model uses an embedding neural network layer to transform sparse feature vectors into dense embedding vectors, and can independently perform feature cross operations on different dimensions, thereby improving the accuracy and depth of feature mining. Experimental results show that the model shows stronger adaptability and higher prediction accuracy in processing complex data sets, effectively solving the problems existing in existing models.
Incorporating Group Prior into Variational Inference for Tail-User Behavior Modeling in CTR Prediction
Xu, Han, Pan, Taoxing, Liu, Zhiqiang, Xu, Xiaoxiao, Hu, Lantao
User behavior modeling -- which aims to extract user interests from behavioral data -- has shown great power in Click-through rate (CTR) prediction, a key component in recommendation systems. Recently, attention-based algorithms have become a promising direction, as attention mechanisms emphasize the relevant interactions from rich behaviors. However, the methods struggle to capture the preferences of tail users with sparse interaction histories. To address the problem, we propose a novel variational inference approach, namely Group Prior Sampler Variational Inference (GPSVI), which introduces group preferences as priors to refine latent user interests for tail users. In GPSVI, the extent of adjustments depends on the estimated uncertainty of individual preference modeling. In addition, We further enhance the expressive power of variational inference by a volume-preserving flow. An appealing property of the GPSVI method is its ability to revert to traditional attention for head users with rich behavioral data while consistently enhancing performance for long-tail users with sparse behaviors. Rigorous analysis and extensive experiments demonstrate that GPSVI consistently improves the performance of tail users. Moreover, online A/B testing on a large-scale real-world recommender system further confirms the effectiveness of our proposed approach.
Crafting Tomorrow: The Influence of Design Choices on Fresh Content in Social Media Recommendation
Saket, Srijan, Agarwal, Mohit, Mehrotra, Rishabh
The rise in popularity of social media platforms, has resulted in millions of new, content pieces being created every day. This surge in content creation underscores the need to pay attention to our design choices as they can greatly impact how long content remains relevant. In today's landscape where regularly recommending new content is crucial, particularly in the absence of detailed information, a variety of factors such as UI features, algorithms and system settings contribute to shaping the journey of content across the platform. While previous research has focused on how new content affects users' experiences, this study takes a different approach by analyzing these decisions considering the content itself. Through a series of carefully crafted experiments we explore how seemingly small decisions can influence the longevity of content, measured by metrics like Content Progression (CVP) and Content Survival (CSR). We also emphasize the importance of recognizing the stages that content goes through underscoring the need to tailor strategies for each stage as a one size fits all approach may not be effective. Additionally we argue for a departure from traditional experimental setups in the study of content lifecycles, to avoid potential misunderstandings while proposing advanced techniques, to achieve greater precision and accuracy in the evaluation process.
Multi-modal clothing recommendation model based on large model and VAE enhancement
Huang, Bingjie, Lu, Qingyi, Huang, Shuaishuai, Wang, Xue-she, Yang, Haowei
This contrasts with traditional models that process text in a single direction, and it has been widely demonstrated that BERT effectively captures contextual and semantic relationships in text, thereby providing a more comprehensive understanding of context. The embedding components of BERT include word embeddings, segment embeddings, and position embeddings. In essence, word embeddings map each word individually into a vector within a high-dimensional space. The segment embeddings allow BERT to differentiate and process single texts or pairs of texts, thereby enabling the understanding of semantic information at the sentence level. The position embeddings provide sequential information to the structure, allowing the model to mark the position of words in a sentence, which aids in further processing at a higher level. Finally, the CLS token at the beginning of the input sequence represents the final hidden state in the embedding vector, which is commonly used as the representation of the entire input sequence.