Personal Assistant Systems
On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-$n$ Recommendation
Jeunen, Olivier, Potapov, Ivan, Ustimenko, Aleksei
Approaches to recommendation are typically evaluated in one of two ways: (1) via a (simulated) online experiment, often seen as the gold standard, or (2) via some offline evaluation procedure, where the goal is to approximate the outcome of an online experiment. Several offline evaluation metrics have been adopted in the literature, inspired by ranking metrics prevalent in the field of Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is one such metric that has seen widespread adoption in empirical studies, and higher (n)DCG values have been used to present new methods as the state-of-the-art in top-$n$ recommendation for many years. Our work takes a critical look at this approach, and investigates when we can expect such metrics to approximate the gold standard outcome of an online experiment. We formally present the assumptions that are necessary to consider DCG an unbiased estimator of online reward and provide a derivation for this metric from first principles, highlighting where we deviate from its traditional uses in IR. Importantly, we show that normalising the metric renders it inconsistent, in that even when DCG is unbiased, ranking competing methods by their normalised DCG can invert their relative order. Through a correlation analysis between off- and on-line experiments conducted on a large-scale recommendation platform, we show that our unbiased DCG estimates strongly correlate with online reward, even when some of the metric's inherent assumptions are violated. This statement no longer holds for its normalised variant, suggesting that nDCG's practical utility may be limited.
Bill Gates says AI is 'pretty dumb' now, but predicts everyone will have robot 'agents' within 5 years
The world of gaming is being rocked by an AI controversy that could upend the multi-billion dollar industry. Microsoft co-founder Bill Gates had a bold prediction for the future of artificial intelligence, arguing that every person will soon have a robot "agent" acting on their behalf. "In the near future, anyone who's online will be able to have a personal assistant powered by artificial intelligence that's far beyond today's technology," Gates said, according to report in Fortune. They're proactive -- capable of making suggestions before you ask for them." Gates comments come as AI technology continues to develop rapidly, with new platforms such as OpenAI's ChatGPT gaining mainstream popularity over the last year. While Gates acknowledged the "software is still pretty dumb" as of 2023, that reality will "change completely" within the next five years. The billionaire tech entrepreneur argued that basically everyone will have a personal assistant adept at carrying out seemingly any task, citing the potential for the technology to plan entire vacations for its users. "When asked, it will recommend things to do based on your interests and propensity for adventure, and it will book reservations at the types of restaurants you would enjoy," Gates said. "If you want this kind of deeply personalized planning today, you need to pay a travel agent and spend time telling them what you want." The Microsoft co-founder argued the technology will have wide-ranging uses to make life easier, carrying out more complex tasks than users of current voice assistants are used to. "If your friend just had surgery, your agent will offer to send flowers and be able to order them for you," Gates said. "If you tell it you'd like to catch up with your old college roommate, it will work with their agent to find a time to get together, and just before you arrive, it will remind you that their oldest child just started college at the local university." While the technology Gates envisions may make people think of the widely held assistants many currently hold in their pocket, such as Apple's Siri, AI assistants will be capable of much more. "Bill Gates is talking about Natural Language Processing (NLP) as the key to these improved AI assistants," Christopher Alexander, the Chief Analytics Officer of Pioneer Development Group, told Fox News Digital. "NLP enabled assistants differ from Siri because NLP is actually a coding language.
CoRE-CoG: Conversational Recommendation of Entities using Constrained Generation
Srivastava, Harshvardhan, Pruthi, Kanav, Chakrabarti, Soumen, Mausam, null
End-to-end conversational recommendation systems (CRS) generate responses by leveraging both dialog history and a knowledge base (KB). A CRS mainly faces three key challenges: (1) at each turn, it must decide if recommending a KB entity is appropriate; if so, it must identify the most relevant KB entity to recommend; and finally, it must recommend the entity in a fluent utterance that is consistent with the conversation history. Recent CRSs do not pay sufficient attention to these desiderata, often generating unfluent responses or not recommending (relevant) entities at the right turn. We introduce a new CRS we call CoRE-CoG. CoRE-CoG addresses the limitations in prior systems by implementing (1) a recommendation trigger that decides if the system utterance should include an entity, (2) a type pruning module that improves the relevance of recommended entities, and (3) a novel constrained response generator to make recommendations while maintaining fluency. Together, these modules ensure simultaneous accurate recommendation decisions and fluent system utterances. Experiments with recent benchmarks show the superiority particularly on conditional generation sub-tasks with close to 10 F1 and 4 Recall@1 percent points gain over baselines.
Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems
Su, Hsuan, Qian, Rebecca, Sankar, Chinnadhurai, Shayandeh, Shahin, Chen, Shang-Tse, Lee, Hung-yi, Bikel, Daniel M.
Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.
Visualization for Recommendation Explainability: A Survey and New Perspectives
Chatti, Mohamed Amine, Guesmi, Mouadh, Muslim, Arham
Providing system-generated explanations for recommendations represents an important step towards transparent and trustworthy recommender systems. Explainable recommender systems provide a human-understandable rationale for their outputs. Over the last two decades, explainable recommendation has attracted much attention in the recommender systems research community. This paper aims to provide a comprehensive review of research efforts on visual explanation in recommender systems. More concretely, we systematically review the literature on explanations in recommender systems based on four dimensions, namely explanation goal, explanation scope, explanation style, and explanation format. Recognizing the importance of visualization, we approach the recommender system literature from the angle of explanatory visualizations, that is using visualizations as a display style of explanation. As a result, we derive a set of guidelines that might be constructive for designing explanatory visualizations in recommender systems and identify perspectives for future work in this field. The aim of this review is to help recommendation researchers and practitioners better understand the potential of visually explainable recommendation research and to support them in the systematic design of visual explanations in current and future recommender systems.
Your smart assistant is listening, but does that impact the ads you see?
FOX News' Eben Brown reports on AI going mainstream in healthcare, which doctors say has the potential to create stronger relationships with patients. Think of everything you do online -- and in real life, too -- that says something about who you are. Your likes, clicks, hobbies and activities all add to the wealth of data points companies already have on you. How is that data used? Let's take a deep look at how they use your conversations to create profiles.
Anticipating User Needs: Insights from Design Fiction on Conversational Agents for Computational Thinking
Penney, Jacob, Pimentel, João Felipe, Steinmacher, Igor, Gerosa, Marco A.
Computational thinking, and by extension, computer programming, is notoriously challenging to learn. Conversational agents and generative artificial intelligence (genAI) have the potential to facilitate this learning process by offering personalized guidance, interactive learning experiences, and code generation. However, current genAI-based chatbots focus on professional developers and may not adequately consider educational needs. Involving educators in conceiving educational tools is critical for ensuring usefulness and usability. We enlisted \numParticipants{} instructors to engage in design fiction sessions in which we elicited abilities such a conversational agent supported by genAI should display. Participants envisioned a conversational agent that guides students stepwise through exercises, tuning its method of guidance with an awareness of the educational background, skills and deficits, and learning preferences. The insights obtained in this paper can guide future implementations of tutoring conversational agents oriented toward teaching computational thinking and computer programming.
Alleviating Behavior Data Imbalance for Multi-Behavior Graph Collaborative Filtering
Zhang, Yijie, Bei, Yuanchen, Yang, Shiqi, Chen, Hao, Li, Zhiqing, Chen, Lijia, Huang, Feiran
Graph collaborative filtering, which learns user and item representations through message propagation over the user-item interaction graph, has been shown to effectively enhance recommendation performance. However, most current graph collaborative filtering models mainly construct the interaction graph on a single behavior domain (e.g. click), even though users exhibit various types of behaviors on real-world platforms, including actions like click, cart, and purchase. Furthermore, due to variations in user engagement, there exists an imbalance in the scale of different types of behaviors. For instance, users may click and view multiple items but only make selective purchases from a small subset of them. How to alleviate the behavior imbalance problem and utilize information from the multiple behavior graphs concurrently to improve the target behavior conversion (e.g. purchase) remains underexplored. To this end, we propose IMGCF, a simple but effective model to alleviate behavior data imbalance for multi-behavior graph collaborative filtering. Specifically, IMGCF utilizes a multi-task learning framework for collaborative filtering on multi-behavior graphs. Then, to mitigate the data imbalance issue, IMGCF improves representation learning on the sparse behavior by leveraging representations learned from the behavior domain with abundant data volumes. Experiments on two widely-used multi-behavior datasets demonstrate the effectiveness of IMGCF.
Faithful Path Language Modelling for Explainable Recommendation over Knowledge Graph
Balloccu, Giacomo, Boratto, Ludovico, Cancedda, Christian, Fenu, Gianni, Marras, Mirko
Path reasoning methods over knowledge graphs have gained popularity for their potential to improve transparency in recommender systems. However, the resulting models still rely on pre-trained knowledge graph embeddings, fail to fully exploit the interdependence between entities and relations in the KG for recommendation, and may generate inaccurate explanations. In this paper, we introduce PEARLM, a novel approach that efficiently captures user behaviour and product-side knowledge through language modelling. With our approach, knowledge graph embeddings are directly learned from paths over the KG by the language model, which also unifies entities and relations in the same optimisation space. Constraints on the sequence decoding additionally guarantee path faithfulness with respect to the KG. Experiments on two datasets show the effectiveness of our approach compared to state-of-the-art baselines. Source code and datasets: AVAILABLE AFTER GETTING ACCEPTED.
i-Razor: A Differentiable Neural Input Razor for Feature Selection and Dimension Search in DNN-Based Recommender Systems
Yao, Yao, Liu, Bin, He, Haoxun, Sheng, Dakui, Wang, Ke, Xiao, Li, Cao, Huanhuan
Input features play a crucial role in DNN-based recommender systems with thousands of categorical and continuous fields from users, items, contexts, and interactions. Noisy features and inappropriate embedding dimension assignments can deteriorate the performance of recommender systems and introduce unnecessary complexity in model training and online serving. Optimizing the input configuration of DNN models, including feature selection and embedding dimension assignment, has become one of the essential topics in feature engineering. However, in existing industrial practices, feature selection and dimension search are optimized sequentially, i.e., feature selection is performed first, followed by dimension search to determine the optimal dimension size for each selected feature. Such a sequential optimization mechanism increases training costs and risks generating suboptimal input configurations. To address this problem, we propose a differentiable neural input razor (i-Razor) that enables joint optimization of feature selection and dimension search. Concretely, we introduce an end-to-end differentiable model to learn the relative importance of different embedding regions of each feature. Furthermore, a flexible pruning algorithm is proposed to achieve feature filtering and dimension derivation simultaneously. Extensive experiments on two large-scale public datasets in the Click-Through-Rate (CTR) prediction task demonstrate the efficacy and superiority of i-Razor in balancing model complexity and performance.