AITopics | Personal Assistant Systems

Collaborating Authors

Personal Assistant Systems

News Overviews Instructional Materials AI-Alerts Classics

Choosing a Proxy Metric from Past Experiments

Tripuraneni, Nilesh, Richardson, Lee, D'Amour, Alexander, Soriano, Jacopo, Yadlowsky, Steve

arXiv.org Machine LearningSep-14-2023

In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.

artificial intelligence, machine learning, proxy metric, (17 more...)

arXiv.org Machine Learning

2309.07893

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Is Rotten Tomatoes Certified Rotten?

SlateSep-13-2023, 09:00:00 GMT

This week, Stephen and Dana are joined by guest host Kat Chow, journalist and author of the 2021 memoir Seeing Ghosts. The panel begins by wading through HELL, Chris Fleming's new hour-long comedy special that's both puzzling and delightfully goofy. Then, the three consider Astrakan, a deeply dark and unsettling first feature from director David Depesseville, and attempt to parse through the film's (intentionally?) Finally, they conclude by discussing Rotten Tomatoes, the widely used critical review aggregation site and subject of the recent Vulture exposé by Lane Brown, "The Decomposition of Rotten Tomatoes," which details a "gaming of the system" by Hollywood PR teams. In the exclusive Slate Plus segment, the panel dives into the 2023 U.S. Open, specifically the effect of extreme heat on gameplay and how the sport will need to contend with climate change going forward.

rotten tomatoe certified rotten

Slate

Industry:

Media (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)

Add feedback

Apple Watch Series 9 can handle Siri requests without your iPhone

EngadgetSep-12-2023, 17:19:20 GMT

It's September, which means the air is thick with the promise of fall, school is back in session, and Apple just revealed a new Apple Watch. This year, at its annual fall event, the company is showing off the Apple Watch Series 9. The Series 9 features a new processor, the S9 chip, and a quad-core neural engine, which promises 18-hour battery life and overall performance boosts. On the software side, watchOS 10 is poised to be the biggest UI overhaul in Apple Watch history, with a renewed focus on widgets, and a slew of app and input updates. The Series 9 is available to order today and it's due to hit the market on September 22.

apple watch series 9, series 9, watchos 10, (9 more...)

Engadget

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.53)
Information Technology > Communications > Mobile (0.52)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.43)

Add feedback

RoDia: A New Dataset for Romanian Dialect Identification from Speech

Rotaru, Codrut, Ristea, Nicolae-Catalin, Ionescu, Radu Tudor

arXiv.org Artificial IntelligenceSep-12-2023

Dialect identification is a critical task in speech processing and language technology, enhancing various applications such as speech recognition, speaker verification, and many others. While most research studies have been dedicated to dialect identification in widely spoken languages, limited attention has been given to dialect identification in low-resource languages, such as Romanian. To address this research gap, we introduce RoDia, the first dataset for Romanian dialect identification from speech. The RoDia dataset includes a varied compilation of speech samples from five distinct regions of Romania, covering both urban and rural environments, totaling 2 hours of manually annotated speech data. Along with our dataset, we introduce a set of competitive models to be used as baselines for future research. The top scoring model achieves a macro F1 score of 59.83% and a micro F1 score of 62.08%, indicating that the task is challenging. We thus believe that RoDia is a valuable resource that will stimulate research aiming to address the challenges of Romanian dialect identification. We publicly release our dataset and code at https://github.com/codrut2/RoDia.

dialect, identification, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2309.03378

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
South America > Argentina (0.04)
North America > United States > Maine (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

A Co-design Study for Multi-Stakeholder Job Recommender System Explanations

Schellingerhout, Roan, Barile, Francesco, Tintarev, Nava

arXiv.org Artificial IntelligenceSep-11-2023

Recent legislation proposals have significantly increased the demand for eXplainable Artificial Intelligence (XAI) in many businesses, especially in so-called `high-risk' domains, such as recruitment. Within recruitment, AI has become commonplace, mainly in the form of job recommender systems (JRSs), which try to match candidates to vacancies, and vice versa. However, common XAI techniques often fall short in this domain due to the different levels and types of expertise of the individuals involved, making explanations difficult to generalize. To determine the explanation preferences of the different stakeholder types - candidates, recruiters, and companies - we created and validated a semi-structured interview guide. Using grounded theory, we structurally analyzed the results of these interviews and found that different stakeholder types indeed have strongly differing explanation preferences. Candidates indicated a preference for brief, textual explanations that allow them to quickly judge potential matches. On the other hand, hiring managers preferred visual graph-based explanations that provide a more technical and comprehensive overview at a glance. Recruiters found more exhaustive textual explanations preferable, as those provided them with more talking points to convince both parties of the match. Based on these findings, we describe guidelines on how to design an explanation interface that fulfills the requirements of all three stakeholder types. Furthermore, we provide the validated interview guide, which can assist future research in determining the explanation preferences of different stakeholder types.

explanation, stakeholder, textual explanation, (16 more...)

arXiv.org Artificial Intelligence

2309.05507

Country: Europe > Netherlands > Limburg > Maastricht (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Personal > Interview (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Would You Rather Stay Home Alone or Online Date?: A Game for Single Women

The New YorkerSep-10-2023, 10:00:00 GMT

Would you rather spend a quiet evening by yourself, reading an awful book with a contrived plot and cringy dialogue . . . Would you rather go for a solo walk and get attacked by hissing Canada geese in heat . . . Would you rather go to a dog park on your own, receive weird looks from dog owners because you have no dog, and get your leg humped by three muddy puppies who smell like pee . . . Would you rather sit at home alone on a Saturday night and binge-watch "The Great British Bake Off" while on a strict no-carb, no-sugar diet . . . Would you rather go to a coffee shop by yourself and sit next to someone who starts loudly conducting a phone interview . . .

home alone, online date, single woman, (2 more...)

The New Yorker

Country: North America > Canada (0.26)

Industry:

Education > Health & Safety > School Nutrition (0.57)
Leisure & Entertainment (0.37)
Consumer Products & Services > Restaurants (0.37)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.33)

Add feedback

This wireless heads up display for your car is more than $150 off now

PCWorldSep-8-2023, 08:00:00 GMT

We're past the days of MapQuest, but people still spend a dangerous amount of time looking at their phones while driving. You need a better solution, and this 9″ Wireless Heads Up Car Display has you covered. Compatible with Apple CarPlay, Android Auto, and wireless compatible mirror linking functions, this display helps you navigate, control music playback, and manage calls via Siri or Google Assistant on a safer dashboard display that you won't have to look into your lap to use. The intuitive tool installs easily on your dashboard via a self-adhesive bracket that doesn't alter your stereo setup. Then, it gives you optimal visibility day and night with automatic brightness adjustment while 4Ω 3W speakers ensure you can hear your music and voice instructions easily.

apple carplay, car display, wireless head

PCWorld

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.64)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)
Information Technology > Communications > Mobile (0.49)
Information Technology > Human Computer Interaction > Interfaces (0.40)

Add feedback

Offline Recommender System Evaluation under Unobserved Confounding

Jeunen, Olivier, London, Ben

arXiv.org Machine LearningSep-8-2023

Off-Policy Estimation (OPE) methods allow us to learn and evaluate decision-making policies from logged data. This makes them an attractive choice for the offline evaluation of recommender systems, and several recent works have reported successful adoption of OPE methods to this end. An important assumption that makes this work is the absence of unobserved confounders: random variables that influence both actions and rewards at data collection time. Because the data collection policy is typically under the practitioner's control, the unconfoundedness assumption is often left implicit, and its violations are rarely dealt with in the existing literature. This work aims to highlight the problems that arise when performing off-policy estimation in the presence of unobserved confounders, specifically focusing on a recommendation use-case. We focus on policy-based estimators, where the logging propensities are learned from logged data. We characterise the statistical bias that arises due to confounding, and show how existing diagnostics are unable to uncover such cases. Because the bias depends directly on the true and unobserved logging propensities, it is non-identifiable. As the unconfoundedness assumption is famously untestable, this becomes especially problematic. This paper emphasises this common, yet often overlooked issue. Through synthetic data, we empirically show how na\"ive propensity estimation under confounding can lead to severely biased metric estimates that are allowed to fly under the radar. We aim to cultivate an awareness among researchers and practitioners of this important problem, and touch upon potential research directions towards mitigating its effects.

data mining, machine learning, propensity, (14 more...)

arXiv.org Machine Learning

2309.04222

Country:

Asia > Singapore (0.06)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.73)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

VideolandGPT: A User Study on a Conversational Recommender System

Granada, Mateo Gutierrez, Zilbershtein, Dina, Odijk, Daan, Barile, Francesco

arXiv.org Artificial IntelligenceSep-7-2023

This paper investigates how large language models (LLMs) can enhance recommender systems, with a specific focus on Conversational Recommender Systems that leverage user preferences and personalised candidate selections from existing ranking models. We introduce VideolandGPT, a recommender system for a Video-on-Demand (VOD) platform, Videoland, which uses ChatGPT to select from a predetermined set of contents, considering the additional context indicated by users' interactions with a chat interface. We evaluate ranking metrics, user experience, and fairness of recommendations, comparing a personalised and a non-personalised version of the system, in a between-subject user study. Our results indicate that the personalised version outperforms the non-personalised in terms of accuracy and general user satisfaction, while both versions increase the visibility of items which are not in the top of the recommendation lists. However, both versions present inconsistent behavior in terms of fairness, as the system may generate recommendations which are not available on Videoland.

participant, recommendation, recommender system, (14 more...)

arXiv.org Artificial Intelligence

2309.03645

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RecFusion: A Binomial Diffusion Process for 1D Data for Recommendation

Bénédict, Gabriel, Jeunen, Olivier, Papa, Samuele, Bhargav, Samarth, Odijk, Daan, de Rijke, Maarten

arXiv.org Artificial IntelligenceSep-7-2023

In this paper we propose RecFusion, which comprise a set of diffusion models for recommendation. Unlike image data which contain spatial correlations, a user-item interaction matrix, commonly utilized in recommendation, lacks spatial relationships between users and items. We formulate diffusion on a 1D vector and propose binomial diffusion, which explicitly models binary user-item interactions with a Bernoulli process. We show that RecFusion approaches the performance of complex VAE baselines on the core recommendation setting (top-n recommendation for binary non-sequential feedback) and the most common datasets (MovieLens and Netflix). Our proposed diffusion models that are specialized for 1D and/or binary setups have implications beyond recommendation systems, such as in the medical domain with MRI and CT scans.

diffusion model, proceedings, recommendation, (11 more...)

arXiv.org Artificial Intelligence

2306.08947

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Media > Film (0.34)
Information Technology (0.34)
Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback