Goto

Collaborating Authors

 Personal Assistant Systems


A New Type of Foundation Model Based on Recordings of People's Emotions and Physiology

arXiv.org Artificial Intelligence

Foundation models have had a big impact in recent years and billions of dollars are being invested in them in the current AI boom. The more popular ones, such as Chat-GPT, are trained on large amounts of data from the Internet, and then reinforcement learning, RAG, prompt engineering and cognitive modelling are used to fine-tune and augment their behavior. This technology has been used to create models of individual people, such as Caryn Marjorie. However, these chatbots are not based on people's actual emotional and physiological responses to their environment, so they are, at best, surface-level approximations to the characters they are imitating. This paper describes how a new type of foundation model - a first-person foundation model - could be created from recordings of what a person sees and hears as well as their emotional and physiological reactions to these stimuli. A first-person foundation model would map environmental stimuli to a person's emotional and physiological states, and map a person's emotional and physiological states to their behavior. First-person foundation models have many exciting applications, including a new type of recommendation engine, personal assistants, generative adversarial networks, dating and recruitment. To obtain training data for a first-person foundation model, we have developed a recording rig that captures what the wearer is seeing and hearing as well as their emotional and physiological states. This novel source of data could help to address the shortage of new data for building the next generation of foundation models.


A Culturally-Aware Tool for Crowdworkers: Leveraging Chronemics to Support Diverse Work Styles

arXiv.org Artificial Intelligence

This issue usually stems from the assumption that crowdworkers are a homogeneous group [56], neglecting their diverse cultural backgrounds [90]. Moreover, a notable trend in design has emerged advocating for minimizing cultural impact in work interfaces, aiming for global uniformity in their design rather than customizing these systems to accommodate cultural nuances [133, 134, 193]. Consequently, many work interfaces have strived for uniform standards, and have ignored worker diversity [76, 84, 88]. However, interfaces often reflect the cultural biases of their designers [18], inadvertently embedding their cultural norms [146, 150, 177]. This can lead to designs that unintentionally require "outside workers" to adapt or modify their behaviors [126, 177], potentially hindering their success and effectiveness in their jobs [24, 60, 64, 85]. A solution can be to create culturally aware tools for crowdworkers, yet research into integrating culture theory into such designs remains limited [108, 118, 163]. Further research is crucial to assess these systems' effectiveness and their potential benefits for crowdworkers from varied cultural backgrounds. To address this knowledge gap, we focus on designing a tool that aims to enhance crowdworkers' experiences by incorporating cultural considerations.


Need of AI in Modern Education: in the Eyes of Explainable AI (xAI)

arXiv.org Artificial Intelligence

Modern Education is not Modern without AI. However, AI's complex nature makes understanding and fixing problems challenging. Research worldwide shows that a parent's income greatly influences a child's education. This led us to explore how AI, especially complex models, makes important decisions using Explainable AI tools. Our research uncovered many complexities linked to parental income and offered reasonable explanations for these decisions. However, we also found biases in AI that go against what we want from AI in education: clear transparency and equal access for everyone. These biases can impact families and children's schooling, highlighting the need for better AI solutions that offer fair opportunities to all. This chapter tries to shed light on the complex ways AI operates, especially concerning biases. These are the foundational steps towards better educational policies, which include using AI in ways that are more reliable, accountable, and beneficial for everyone involved.


Prometheus Chatbot: Knowledge Graph Collaborative Large Language Model for Computer Components Recommendation

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) are essential in applications such as network alignment, question-answering, and recommender systems (RSs) since they offer structured relational data that facilitate the inference of indirect relationships. However, the development of KG-based RSs capable of processing user inputs in natural language faces significant challenges. Firstly, natural language processing units must effectively handle the ambiguity and variability in human language to interpret user intents accurately. Secondly, the system must precisely identify and link entities, like product names, to their corresponding nodes in KGs. To overcome these challenges, supported by Lenovo, we developed a novel chatbot called "Prometheus," which integrates a KG with a large language model (LLM), specifically designed for recommending computer components. This chatbot can accurately decode user requests and deliver personalized recommendations derived from KGs, ensuring precise comprehension and response to their computer setup needs.


RevGNN: Negative Sampling Enhanced Contrastive Graph Learning for Academic Reviewer Recommendation

arXiv.org Artificial Intelligence

Acquiring reviewers for academic submissions is a challenging recommendation scenario. Recent graph learning-driven models have made remarkable progress in the field of recommendation, but their performance in the academic reviewer recommendation task may suffer from a significant false negative issue. This arises from the assumption that unobserved edges represent negative samples. In fact, the mechanism of anonymous review results in inadequate exposure of interactions between reviewers and submissions, leading to a higher number of unobserved interactions compared to those caused by reviewers declining to participate. Therefore, investigating how to better comprehend the negative labeling of unobserved interactions in academic reviewer recommendations is a significant challenge. This study aims to tackle the ambiguous nature of unobserved interactions in academic reviewer recommendations. Specifically, we propose an unsupervised Pseudo Neg-Label strategy to enhance graph contrastive learning (GCL) for recommending reviewers for academic submissions, which we call RevGNN. RevGNN utilizes a two-stage encoder structure that encodes both scientific knowledge and behavior using Pseudo Neg-Label to approximate review preference. Extensive experiments on three real-world datasets demonstrate that RevGNN outperforms all baselines across four metrics. Additionally, detailed further analyses confirm the effectiveness of each component in RevGNN.


A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation

arXiv.org Artificial Intelligence

With the rapid development of online multimedia services, especially in e-commerce platforms, there is a pressing need for personalised recommendation systems that can effectively encode the diverse multi-modal content associated with each item. However, we argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality modelling. Such isolated processes can harm the recommendation performance. Firstly, an isolated extraction process underestimates the importance of effective feature extraction in multi-modal recommendations, potentially incorporating non-relevant information, which is harmful to item representations. Second, an isolated modality modelling process produces disjointed embeddings for item modalities due to the individual processing of each modality, which leads to a suboptimal fusion of user/item representations for effective user preferences prediction. We hypothesise that the use of a unified model for addressing both aforementioned isolated processes will enable the consistent extraction and cohesive fusion of joint multi-modal features, thereby enhancing the effectiveness of multi-modal recommender systems. In this paper, we propose a novel model, called Unified Multi-modal Graph Transformer (UGT), which firstly leverages a multi-way transformer to extract aligned multi-modal features from raw data for top-k recommendation. Subsequently, we build a unified graph neural network in our UGT model to jointly fuse the user/item representations with their corresponding multi-modal features. Using the graph transformer architecture of our UGT model, we show that the UGT model can achieve significant effectiveness gains, especially when jointly optimised with the commonly-used multi-modal recommendation losses.


EXIT: An EXplicit Interest Transfer Framework for Cross-Domain Recommendation

arXiv.org Artificial Intelligence

Cross-domain recommendation has attracted substantial interest in industrial apps such as Meituan, which serves multiple business domains via knowledge transfer and meets the diverse interests of users. However, existing methods typically follow an implicit modeling paradigm that blends the knowledge from both the source and target domains, and design intricate network structures to share learned embeddings or patterns between domains to improve recommendation accuracy. Since the transfer of interest signals is unsupervised, these implicit paradigms often struggle with the negative transfer resulting from differences in service functions and presentation forms across different domains. In this paper, we propose a simple and effective EXplicit Interest Transfer framework named EXIT to address the stated challenge. Specifically, we propose a novel label combination approach that enables the model to directly learn beneficial source domain interests through supervised learning, while excluding inappropriate interest signals. Moreover, we introduce a scene selector network to model the interest transfer intensity under fine-grained scenes. Offline experiments conducted on the industrial production dataset and online A/B tests validate the superiority and effectiveness of our proposed framework. Without complex network structures or training processes, EXIT can be easily deployed in the industrial recommendation system. EXIT has been successfully deployed in the online homepage recommendation system of Meituan App, serving the main traffic.


Business and Regulatory Responses to Artificial Intelligence: Dynamic Regulation, Innovation Ecosystems and the Strategic Management of Disruptive Technology

arXiv.org Artificial Intelligence

Identifying and then implementing an effective response to disruptive new AI technologies is enormously challenging for any business looking to integrate AI into their operations, as well as regulators looking to leverage AI-related innovation as a mechanism for achieving regional economic growth. These business and regulatory challenges are particularly significant given the broad reach of AI, as well as the multiple uncertainties surrounding such technologies and their future development and effects. This article identifies two promising strategies for meeting the AI challenge, focusing on the example of Fintech. First, dynamic regulation, in the form of regulatory sandboxes and other regulatory approaches that aim to provide a space for responsible AI-related innovation. An empirical study provides preliminary evidence to suggest that jurisdictions that adopt a more proactive approach to Fintech regulation can attract greater investment. The second strategy relates to so-called innovation ecosystems. It is argued that such ecosystems are most effective when they afford opportunities for creative partnerships between well-established corporations and AI-focused startups and that this aspect of a successful innovation ecosystem is often overlooked in the existing discussion. The article suggests that these two strategies are interconnected, in that greater investment is an important element in both fostering and signaling a well-functioning innovation ecosystem and that a well-functioning ecosystem will, in turn, attract more funding. The resulting synergies between these strategies can, therefore, provide a jurisdiction with a competitive edge in becoming a regional hub for AI-related activity.


Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation

arXiv.org Artificial Intelligence

The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE). Our investigation reveals that the benefits of GCNs are more pronounced during testing rather than training. Motivated by this, LightGODE utilizes a novel post-training graph convolution method that bypasses the computation-intensive message passing of GCNs and employs a non-parametric continuous graph ordinary-differential-equation (ODE) to dynamically model node representations. This approach drastically reduces training time while achieving fine-grained post-training graph convolution to avoid the distortion of the original training embedding space, termed the embedding discrepancy issue. We validate our model across several real-world datasets of different scales, demonstrating that LightGODE not only outperforms GCN-based models in terms of efficiency and effectiveness but also significantly mitigates the embedding discrepancy commonly associated with deeper graph convolution layers. Our LightGODE challenges the prevailing paradigms in RecSys training and suggests re-evaluating the role of graph convolutions, potentially guiding future developments of efficient large-scale graph-based RecSys.


Conversational AI Multi-Agent Interoperability, Universal Open APIs for Agentic Natural Language Multimodal Communications

arXiv.org Artificial Intelligence

This paper analyses Conversational AI multi-agent interoperability frameworks and describes the novel architecture proposed by the Open Voice Interoperability initiative (Linux Foundation AI and DATA), also known briefly as OVON (Open Voice Network). The new approach is illustrated, along with the main components, delineating the key benefits and use cases for deploying standard multi-modal AI agency (or agentic AI) communications. Beginning with Universal APIs based on Natural Language, the framework establishes and enables interoperable interactions among diverse Conversational AI agents, including chatbots, voicebots, videobots, and human agents. Furthermore, a new Discovery specification framework is introduced, designed to efficiently look up agents providing specific services and to obtain accurate information about these services through a standard Manifest publication, accessible via an extended set of Natural Language-based APIs. The main purpose of this contribution is to significantly enhance the capabilities and scalability of AI interactions across various platforms. The novel architecture for interoperable Conversational AI assistants is designed to generalize, being replicable and accessible via open repositories.