Goto

Collaborating Authors

 Personal Assistant Systems


Engadget Podcast: MacBook Air M4 review, Apple delays smarter Siri

Engadget

In this episode, we dive into Devindra's review of the excellent M4-equipped MacBook Air (and briefly chat about the new Mac Studio). We also discuss Apple's surprise announcement that it's delaying its smarter, AI-infused Siri, which may not arrive until next year. Did Apple over-promise last year, or is it wise to hold off on advanced AI features until they're ready? After all, Apple doesn't want a fiasco like Microsoft's Recall announcement. Listen below or subscribe on your podcast app of choice.


REGEN: A Dataset and Benchmarks with Natural Language Critiques and Narratives

arXiv.org Artificial Intelligence

This paper introduces a novel dataset REGEN (Reviews Enhanced with GEnerative Narratives), designed to benchmark the conversational capabilities of recommender Large Language Models (LLMs), addressing the limitations of existing datasets that primarily focus on sequential item prediction. REGEN extends the Amazon Product Reviews dataset by inpainting two key natural language features: (1) user critiques, representing user "steering" queries that lead to the selection of a subsequent item, and (2) narratives, rich textual outputs associated with each recommended item taking into account prior context. The narratives include product endorsements, purchase explanations, and summaries of user preferences. Further, we establish an end-to-end modeling benchmark for the task of conversational recommendation, where models are trained to generate both recommendations and corresponding narratives conditioned on user history (items and critiques). For this joint task, we introduce a modeling framework LUMEN (LLM-based Unified Multi-task Model with Critiques, Recommendations, and Narratives) which uses an LLM as a backbone for critiquing, retrieval and generation. We also evaluate the dataset's quality using standard auto-rating techniques and benchmark it by training both traditional and LLM-based recommender models. Our results demonstrate that incorporating critiques enhances recommendation quality by enabling the recommender to learn language understanding and integrate it with recommendation signals. Furthermore, LLMs trained on our dataset effectively generate both recommendations and contextual narratives, achieving performance comparable to state-of-the-art recommenders and language models.


Apple reportedly plans to add a live-translation feature to AirPods

Engadget

AirPods are arguably Apple's most popular post-iPhone product, and it sounds like the company has plans to make them even more essential. Bloomberg reports that Apple is adding a live-translate feature to AirPods later this year as part of an upcoming software update. The feature sounds like it would work in a similar way to the translation feature on the Pixel Buds, only without the need to ask Google Assistant or in this case, Siri, to start listening for a specific language first. Apple's feature would reportedly automatically detect that something other than your native language is being spoken, and start converting what you're hearing into a language you understand. Pixel Buds have had live-translation since 2020, one of the few abilities that makes Google's earbuds superior to Apple's.


Microsoft is making a Copilot AI assistant for gamers, but it's not clear what it does

Engadget

Microsoft just announced the pending availability of Copilot for Gaming, an AI-powered assistant that's being advertised as a novel way to help players get better at their favorite titles. The company says it will accompany people through games, offering tips, guides and useful information along the way. Microsoft boasts it can also help folks download and launch games, solving the eternal problem of, uh, pressing a button. Copilot for Gaming will be available as part of the Xbox mobile app, so it's being positioned as a second screen type of thing. Copilot for Gaming Soon you'll be able to turn to it for everything from game setup, to tips for finally beating a tough level, wherever you play on Xbox.


Revisiting the Apple Watch SE in 2025 left me with a long list of update requests

Engadget

As you know, your Apple Watch SE is not new. The second generation came out in September 2022 alongside the Series 8 and the first iteration of the Ultra. You've given the iPhone, all models of the iPad, AirPods, MacBooks and both the flagship and premium smartwatches updates since then -- but not the budget smartwatch. Last month, my editors asked me to see how the Watch SE stacks up in 2025 and I was happy to oblige. I love getting my hands on novel tech, analyzing, evaluating and experiencing a device (then giving it back when I'm done so I don't have to accumulate more stuff).


Harmonizing Large Language Models with Collaborative Behavioral Signals for Conversational Recommendation

arXiv.org Artificial Intelligence

Conversational recommendation frameworks have gained prominence as a dynamic paradigm for delivering personalized suggestions via interactive dialogues. The incorporation of advanced language understanding techniques has substantially improved the dialogue fluency of such systems. However, while modern language models demonstrate strong proficiency in interpreting user preferences articulated through natural conversation, they frequently encounter challenges in effectively utilizing collective behavioral patterns - a crucial element for generating relevant suggestions. To mitigate this limitation, this work presents a novel probabilistic framework that synergizes behavioral patterns with conversational interactions through latent preference modeling. The proposed method establishes a dual-channel alignment mechanism where implicit preference representations learned from collective user interactions serve as a connecting mechanism between behavioral data and linguistic expressions. Specifically, the framework first derives latent preference representations through established collaborative filtering techniques, then employs these representations to jointly refine both the linguistic preference expressions and behavioral patterns through an adaptive fusion process. Comprehensive evaluations across multiple benchmark datasets demonstrate the superior performance of the proposed approach compared to various state-of-the-art baseline methods, particularly in aligning conversational interactions with collaborative behavioral signals.


Towards Robust Model Evolution with Algorithmic Recourse

arXiv.org Artificial Intelligence

Algorithmic Recourse is a way for users to modify their attributes to align with a model's expectations, thereby improving their outcomes after receiving unfavorable decisions. In real-world scenarios, users often need to strategically adjust their attributes to compete for limited resources. However, such strategic behavior induces users to "game" algorithms, causing model collapse due to distribution shifts. These shifts arise from user competition, resource constraints, and adaptive user responses. While prior research on Algorithmic Recourse has explored its effects on both systems and users, the impact of resource constraints and competition over time remains underexplored. In this work, we develop a general framework to model user strategic behaviors and their interactions with decision-making systems under resource constraints and competitive dynamics. Through theoretical analysis and empirical evaluation, we identify three key phenomena that arise consistently in both synthetic and real-world datasets: escalating decision boundaries, non-robust model predictions, and inequitable recourse actions. Finally, we discuss the broader social implications of these findings and present two algorithmic strategies aimed at mitigating these challenges.


Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs

arXiv.org Artificial Intelligence

Recommender systems (RecSys) are widely used across various modern digital platforms and have garnered significant attention. Traditional recommender systems usually focus only on fixed and simple recommendation scenarios, making it difficult to generalize to new and unseen recommendation tasks in an interactive paradigm. Recently, the advancement of large language models (LLMs) has revolutionized the foundational architecture of RecSys, driving their evolution into more intelligent and interactive personalized recommendation assistants. However, most existing studies rely on fixed task-specific prompt templates to generate recommendations and evaluate the performance of personalized assistants, which limits the comprehensive assessments of their capabilities. This is because commonly used datasets lack high-quality textual user queries that reflect real-world recommendation scenarios, making them unsuitable for evaluating LLM-based personalized recommendation assistants. To address this gap, we introduce RecBench+, a new dataset benchmark designed to access LLMs' ability to handle intricate user recommendation needs in the era of LLMs. RecBench+ encompasses a diverse set of queries that span both hard conditions and soft preferences, with varying difficulty levels. We evaluated commonly used LLMs on RecBench+ and uncovered below findings: 1) LLMs demonstrate preliminary abilities to act as recommendation assistants, 2) LLMs are better at handling queries with explicitly stated conditions, while facing challenges with queries that require reasoning or contain misleading information. Our dataset has been released at https://github.com/jiani-huang/RecBench.git.


38 buys you an AI-powered personal assistant for life

Popular Science

It's 2025, and we don't have flying cars or talking robots yet. Fortunately, we do have some cool apps courtesy of artificial intelligence, and DeskSense is one of them. DeskSense provides a personal assistant at your beck and call, and you can currently score a lifetime subscription to their basic plan for just 38--the best price on the web--right here for a limited time. Admit it, you've always wondered what it would be like to be a rich billionaire with a fleet of employees. While you may not have a Jeeves to bring you coffee, now DeskSense can craft that important email for you or translate content in seconds.


Could a dating app for games help smaller developers?

BBC News

One of the experts involved in Ludocene is veteran US games journalist Brian Crecente. He set up gaming websites Kotaku and Polygon, led video games coverage at Rolling Stone and Variety, and now runs a consultancy business. He says there's currently "a perfect storm for not knowing what to play" thanks to the reliance on search engine optimisation (SEO) and automatic algorithms. "There's just so much stuff," he says. It's very hard to discover what it is you might like and you might miss out on some hidden gems." A lot has been written about layoffs and studio closures in the video games industry, but Brian points out that many websites and magazines dedicated to it have also closed.