Personal Assistant Systems
Recommendations from Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization
Sankagiri, Suryanarayana, Etesami, Jalal, Grossglauser, Matthias
This paper provides a theoretical analysis of a new learning problem for recommender systems where users provide feedback by comparing pairs of items instead of rating them individually. We assume that comparisons stem from latent user and item features, which reduces the task of predicting preferences to learning these features from comparison data. Similar to the classical matrix factorization problem, the main challenge in this learning task is that the resulting loss function is nonconvex. Our analysis shows that the loss function exhibits (restricted) strong convexity near the true solution, which ensures gradient-based methods converge exponentially, given an appropriate warm start. Importantly, this result holds in a sparse data regime, where each user compares only a few pairs of items. Our main technical contribution is to extend certain concentration inequalities commonly used in matrix completion to our model. Our work demonstrates that learning personalized recommendations from comparison data is computationally and statistically efficient.
Exploring Rewriting Approaches for Different Conversational Tasks
Tanjim, Md Mehrab, Rossi, Ryan A., Rimer, Mike, Chen, Xiang, Kim, Sungchul, Muppala, Vaishnavi, Yu, Tong, Hu, Zhengmian, Sinha, Ritwik, Zhang, Wei, Burhanuddin, Iftikhar Ahamath, Dernoncourt, Franck
Conversational assistants often require a question rewriting algorithm that leverages a subset of past interactions to provide a more meaningful (accurate) answer to the user's question or request. However, the exact rewriting approach may often depend on the use case and application-specific tasks supported by the conversational assistant, among other constraints. In this paper, we systematically investigate two different approaches, denoted as rewriting and fusion, on two fundamentally different generation tasks, including a text-to-text generation task and a multimodal generative task that takes as input text and generates a visualization or data table that answers the user's question. Our results indicate that the specific rewriting or fusion approach highly depends on the underlying use case and generative task. In particular, we find that for a conversational question-answering assistant, the query rewriting approach performs best, whereas for a data analysis assistant that generates visualizations and data tables based on the user's conversation with the assistant, the fusion approach works best. Notably, we explore two datasets for the data analysis assistant use case, for short and long conversations, and we find that query fusion always performs better, whereas for the conversational text-based question-answering, the query rewrite approach performs best.
GOD model: Privacy Preserved AI School for Personal Assistant
PIN AI Team, null, Sun, Bill, Guo, Gavin, Peng, Regan, Zhang, Boliang, Wang, Shouqiao, Florescu, Laura, Wang, Xi, Crapis, Davide, Wu, Ben
Personal AI assistants (e.g., Apple Intelligence, Meta AI) offer proactive recommendations that simplify everyday tasks, but their reliance on sensitive user data raises concerns about privacy and trust. To address these challenges, we introduce the Guardian of Data (GOD), a secure, privacy-preserving framework for training and evaluating AI assistants directly on-device. Unlike traditional benchmarks, the GOD model measures how well assistants can anticipate user needs-such as suggesting gifts-while protecting user data and autonomy. Functioning like an AI school, it addresses the cold start problem by simulating user queries and employing a curriculum-based approach to refine the performance of each assistant. Running within a Trusted Execution Environment (TEE), it safeguards user data while applying reinforcement and imitation learning to refine AI recommendations. A token-based incentive system encourages users to share data securely, creating a data flywheel that drives continuous improvement. Specifically, users mine with their data, and the mining rate is determined by GOD's evaluation of how well their AI assistant understands them across categories such as shopping, social interactions, productivity, trading, and Web3. By integrating privacy, personalization, and trust, the GOD model provides a scalable, responsible path for advancing personal AI assistants. For community collaboration, part of the framework is open-sourced at https://github.com/PIN-AI/God-Model.
Amazon's Souped-Up Alexa Arrives Next Month
Amazon's new and improved version of Alexa is here, and it's called Alexa . The next-gen upgrade is more conversational, can execute complex tasks, and is much more personalized. While the rollout starts next month on select Echo Show devices, Amazon claims it'll eventually be available on every Alexa-powered device the company has shipped. It'll cost 20 per month but will be free for Amazon customers. Here's everything you need to know about Amazon's new and improved virtual assistant.
Alexa is a smarter, more conversational AI version of Amazon's digital assistant
Following years of development, Amazon's next-generation digital assistant is ready for public use. The model powering Alexa can detect tone and mood and respond accordingly, with a completely new voice -- one that sounds more natural. Moreover, it's only necessary to say "Alexa" once to wake the assistant. It will then follow the conversation. Panay said Alexa has contextual awareness, with the ability to "remember" earlier parts of a conversation.
How to follow the Amazon's Alexa event today
It's sort of out of character for Amazon to be hosting a devices event in February, as opposed to its usual Fall launch. But this morning (February 26) at 10am ET, the company is holding a presentation in New York City. As it's done in the past, Amazon won't be livestreaming this event, and you won't be able to watch Panos Panay and his colleagues present to members of the media. Don't worry about FOMO, though. Engadget will be attending and liveblogging the event, so if you follow our updates it'll almost feel like you're right there with us! We'll have commentary and contextualization on the announcements, as well as the in-person vibes and quality of snacks.
Image Fusion for Cross-Domain Sequential Recommendation
Wu, Wangyu, Song, Siqi, Qiu, Xianglin, Huang, Xiaowei, Ma, Fei, Xiao, Jimin
Cross-Domain Sequential Recommendation (CDSR) aims to predict future user interactions based on historical interactions across multiple domains. The key challenge in CDSR is effectively capturing cross-domain user preferences by fully leveraging both intra-sequence and inter-sequence item interactions. In this paper, we propose a novel method, Image Fusion for Cross-Domain Sequential Recommendation (IFCDSR), which incorporates item image information to better capture visual preferences. Our approach integrates a frozen CLIP model to generate image embeddings, enriching original item embeddings with visual data from both intra-sequence and inter-sequence interactions. Additionally, we employ a multiple attention layer to capture cross-domain interests, enabling joint learning of single-domain and cross-domain user preferences. To validate the effectiveness of IFCDSR, we re-partitioned four e-commerce datasets and conducted extensive experiments. Results demonstrate that IFCDSR significantly outperforms existing methods.
Multiview graph dual-attention deep learning and contrastive learning for multi-criteria recommender systems
Forouzandeh, Saman, Krivitsky, Pavel N., Chandra, Rohitash
Recommender systems leveraging deep learning models have been crucial for assisting users in selecting items aligned with their preferences and interests. However, a significant challenge persists in single-criteria recommender systems, which often overlook the diverse attributes of items that have been addressed by Multi-Criteria Recommender Systems (MCRS). Shared embedding vector for multi-criteria item ratings but have struggled to capture the nuanced relationships between users and items based on specific criteria. In this study, we present a novel representation for Multi-Criteria Recommender Systems (MCRS) based on a multi-edge bipartite graph, where each edge represents one criterion rating of items by users, and Multiview Dual Graph Attention Networks (MDGAT). Employing MDGAT is beneficial and important for adequately considering all relations between users and items, given the presence of both local (criterion-based) and global (multi-criteria) relations. Additionally, we define anchor points in each view based on similarity and employ local and global contrastive learning to distinguish between positive and negative samples across each view and the entire graph. We evaluate our method on two real-world datasets and assess its performance based on item rating predictions. The results demonstrate that our method achieves higher accuracy compared to the baseline method for predicting item ratings on the same datasets. MDGAT effectively capture the local and global impact of neighbours and the similarity between nodes.
Apple iPhone's voice-to-text feature periodically shows 'Trump' when user says 'racist'
Apple's iPhone voice-to-text feature is sparking controversy after a viral TikTok video showed a user speaking the word "racist," which at first showed up as "Trump" before switching back to "racist." Fox News Digital was able to replicate the issue multiple times. The voice-to-text dictation feature was observed briefly flashing "Trump" when a user said "racist" before it quickly changed back to "racist" – just like in the viral TikTok video. However, "Trump" did not appear every time a user said "racist." The voice-to-text feature also wrote words like "reinhold" and "you" when a user said "racist."
AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms
Yan, Yuwei, Shang, Yu, Zeng, Qingbin, Li, Yu, Zhao, Keyu, Zheng, Zhiheng, Ning, Xuefei, Wu, Tianji, Yan, Shengen, Wang, Yu, Xu, Fengli, Li, Yong
The AgentSociety Challenge is the first competition in the Web Conference that aims to explore the potential of Large Language Model (LLM) agents in modeling user behavior and enhancing recommender systems on web platforms. The Challenge consists of two tracks: the User Modeling Track and the Recommendation Track. Participants are tasked to utilize a combined dataset from Yelp, Amazon, and Goodreads, along with an interactive environment simulator, to develop innovative LLM agents. The Challenge has attracted 295 teams across the globe and received over 1,400 submissions in total over the course of 37 official competition days. The participants have achieved 21.9% and 20.3% performance improvement for Track 1 and Track 2 in the Development Phase, and 9.1% and 15.9% in the Final Phase, representing a significant accomplishment. This paper discusses the detailed designs of the Challenge, analyzes the outcomes, and highlights the most successful LLM agent designs. To support further research and development, we have open-sourced the benchmark environment at https://tsinghua-fib-lab.github.io/AgentSocietyChallenge.