Goto

Collaborating Authors

 Personal Assistant Systems


Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain

arXiv.org Artificial Intelligence

Conversational recommender systems (CRS) enable users to articulate their preferences and provide feedback through natural language. With the advent of large language models (LLMs), the potential to enhance user engagement with CRS and augment the recommendation process with LLM-generated content has received increasing attention. However, the efficacy of LLM-powered CRS is contingent upon the use of prompts, and the subjective perception of recommendation quality can differ across various recommendation domains. Therefore, we have developed a ChatGPT-based CRS to investigate the impact of these two factors, prompt guidance (PG) and recommendation domain (RD), on the overall user experience of the system. We conducted an online empirical study (N = 100) by employing a mixed-method approach that utilized a between-subjects design for the variable of PG (with vs. without) and a within-subjects design for RD (book recommendations vs. job recommendations). The findings reveal that PG can substantially enhance the system's explainability, adaptability, perceived ease of use, and transparency. Moreover, users are inclined to perceive a greater sense of novelty and demonstrate a higher propensity to engage with and try recommended items in the context of book recommendations as opposed to job recommendations. Furthermore, the influence of PG on certain user experience metrics and interactive behaviors appears to be modulated by the recommendation domain, as evidenced by the interaction effects between the two examined factors. This work contributes to the user-centered evaluation of ChatGPT-based CRS by investigating two prominent factors and offers practical design guidance.


Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation

arXiv.org Artificial Intelligence

We introduce a multimodal dataset where users express preferences through images. These images encompass a broad spectrum of visual expressions ranging from landscapes to artistic depictions. Users request recommendations for books or music that evoke similar feelings to those captured in the images, and recommendations are endorsed by the community through upvotes. This dataset supports two recommendation tasks: title generation and multiple-choice selection. Our experiments with large foundation models reveal their limitations in these tasks. Particularly, vision-language models show no significant advantage over language-only counterparts that use descriptions, which we hypothesize is due to underutilized visual capabilities. To better harness these abilities, we propose the chain-of-imagery prompting, which results in notable improvements. We release our code and datasets.


Microsoft unveils Copilot for Teams

Engadget

At this year's Build event, Microsoft has announced Team Copilot, and as you can probably guess from its name, it's a variant of the company's AI tool that can cater to the needs of a group of users. It expands Copilot's abilities beyond that of a personal assistant, so that it can serve a whole team, a department or even an entire organization, the company said in its announcement. The new tool was designed to take on time-consuming tasks to free up personnel, such as managing meeting agenda and taking down minutes that group members can tweak as needed. The new Copilot for Teams can also serve as a meeting moderator by summarizing important information for latecomers (or for reference after the fact) and answering questions. Finally, it can create and assign tasks in Planner, track their deadlines, and notify team members if they need to contribute to or review a certain task.


Microsoft teams up with Khan Academy to make the Khanmigo AI teaching assistant free

Engadget

Microsoft and non-profit educational organization Khan Academy have formed a partnership that will allow all K-12 educators in the US to access the pilot version of Khanmigo for Teachers at no cost. Khanmigo is an AI-powered teaching assistant that can help teachers find ways to make lessons more fun and engaging. The tool can also quickly create lesson plans and suggest student groups for team activities. Khan Academy says Khanmigo can save teachers an average of five working hours every week. The service previously cost educators 4 a month, but Khan Academy has dropped those fees since its Microsoft partnership allows it to use the Azure OpenAI Service to power Khanmigo for free.


Amazon Echo Hub review: Alexa's affordable smart-home dashboard

The Guardian

Amazon's latest Alexa device feels like the missing piece in making a home fully smart and acts as a hub for controlling lights, doors, cameras, timers and heating. The Echo Hub arrives ready to be the touchscreen controller for your smart home, and is a cut-price option for a device that usually has to be either professionally installed, costing thousands, or a DIY job that requires more than a little expertise. Able to be wall-mounted or placed on a stand, the Echo Hub costs 170 ( 200/ 180) and acts as a clock and digital photo frame when idle, displaying a range of stock shots or pulling snaps from your prerequisite Amazon account or Facebook on its 8in LCD screen. When woken up, it is filled with buttons and widgets for controlling things around the home. A list of rooms on the left lets you see every device connected to Alexa, while a row of buttons at the bottom gives you quick access to categories of things, such as security devices, cameras, thermostats and lights. Routines can be programmed and turned on, such as dimming the lights in the evening or opening the curtains in the morning.


GotFunding: A grant recommendation system based on scientific articles

arXiv.org Artificial Intelligence

Obtaining funding is an important part of becoming a successful scientist. Junior faculty spend a great deal of time finding the right agencies and programs that best match their research profile. But what are the factors that influence the best publication--grant matching? Some universities might employ pre-award personnel to understand these factors, but not all institutions can afford to hire them. Historical records of publications funded by grants can help us understand the matching process and also help us develop recommendation systems to automate it. In this work, we present \textsc{GotFunding} (Grant recOmmendaTion based on past FUNDING), a recommendation system trained on National Institutes of Health's (NIH) grant--publication records. Our system achieves a high performance (NDCG@1 = 0.945) by casting the problem as learning to rank. By analyzing the features that make predictions effective, our results show that the ranking considers most important 1) the year difference between publication and grant grant, 2) the amount of information provided in the publication, and 3) the relevance of the publication to the grant. We discuss future improvements of the system and an online tool for scientists to try.


Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

arXiv.org Artificial Intelligence

These devices serve as data collection powerhouses, continuously amassing vast repositories of personalized multi-modal data, which can include a wide array of input modalities such as text, images and videos. The potential locked within this trove of multi-modal data arriving continuously is immense, promising to unlock high-quality and tailored device-aware services for individual users. Despite promising, the personalized device service involves analyzing the dynamic nature of the multi-modal data that underscore users' intentions. The prevailing artificial intelligence (AI) systems, primarily trained and deployed in cloud-based environments, face a profound challenge in adapting to the dynamic device data when using a static cloud model for all individual users, mainly due to the distribution shift of the cloud and device data, as shown in Figure 1. In other words, high-quality personalized service requires AI systems to undergo continual refinement and adaptation to accommodate the evolving landscape of personalized multi-modal data. Intuitively, one of the straightforward adaptation strategies is to fine-tune the cloud model based on the device's multi-modal data, which can kindly alleviate the cloud-device data distribution shift to model users' intentions. Nevertheless, we contend that the fine-tuning-adaptation (FTA) paradigm may not satisfactorily resolve device model personalization, which can be summarized as two key aspects: (1) Undesirable Annotation.


RecGPT: Generative Pre-training for Text-based Recommendation

arXiv.org Artificial Intelligence

We present the first domain-adapted and fully-trained large language model, RecGPT-7B, and its instruction-following variant, RecGPT-7B-Instruct, for text-based recommendation. Experimental results on rating prediction and sequential recommendation tasks show that our model, RecGPT-7B-Instruct, outperforms previous strong baselines. We are releasing our RecGPT models as well as their pre-training and fine-tuning datasets to facilitate future research and downstream applications in text-based recommendation. Public "huggingface" links to our RecGPT models and datasets are available at: https://github.com/VinAIResearch/RecGPT


An Aligning and Training Framework for Multimodal Recommendations

arXiv.org Artificial Intelligence

With the development of multimedia applications, multimodal recommendations play an essential role, as they can leverage rich contexts beyond user and item interactions. Existing methods mainly use them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features. Directly using multimodal information as an auxiliary would lead to misalignment in items' and users' representations. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a distinct objective function. To effectively train AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together. As it is essential to analyze whether each multimodal feature helps in training, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by our framework are better than currently used ones, which are to be open-sourced.


Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely research questions in open domains, or they are not open source due to data security and privacy in real scenarios. Meanwhile, existing studies have not explored the training efficiency and transferability of KGP models in depth. To address these problems, We propose a framework MuDoK to achieve multi-domain collaborative pre-training and efficient prefix prompt tuning to serve diverse downstream tasks like recommendation and text understanding. Our design is a plug-and-play prompt learning approach that can be flexibly adapted to different downstream task backbones. In response to the lack of open-source benchmarks, we constructed a new multi-domain KGP benchmark called KPI with two large-scale KGs and six different sub-domain tasks to evaluate our method and open-sourced it for subsequent research. We evaluated our approach based on constructed KPI benchmarks using diverse backbone models in heterogeneous downstream tasks. The experimental results show that our framework brings significant performance gains, along with its generality, efficiency, and transferability.