Goto

Collaborating Authors

 Personal Assistant Systems


Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge

arXiv.org Artificial Intelligence

With the rapid advancement of Multimodal Large Language Models (MLLMs), an increasing number of researchers are exploring their application in recommendation systems. However, the high latency associated with large models presents a significant challenge for such use cases. The EReL@MIR workshop provided a valuable opportunity to experiment with various approaches aimed at improving the efficiency of multimodal representation learning for information retrieval tasks. As part of the competition's requirements, participants were mandated to submit a technical report detailing their methodologies and findings. Our team was honored to receive the award for Task 2 - Winner (Multimodal CTR Prediction). In this technical report, we present our methods and key findings. Additionally, we propose several directions for future work, particularly focusing on how to effectively integrate recommendation signals into multimodal representations. The codebase for our implementation is publicly available at: https://github.com/Lattice-zjj/MMCTR_Code, and the trained model weights can be accessed at: https://huggingface.co/FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1.


FROG: Effective Friend Recommendation in Online Games via Modality-aware User Preferences

arXiv.org Artificial Intelligence

Due to the convenience of mobile devices, the online games have become an important part for user entertainments in reality, creating a demand for friend recommendation in online games. However, none of existing approaches can effectively incorporate the multi-modal user features (e.g., images and texts) with the structural information in the friendship graph, due to the following limitations: (1) some of them ignore the high-order structural proximity between users, (2) some fail to learn the pairwise relevance between users at modality-specific level, and (3) some cannot capture both the local and global user preferences on different modalities. By addressing these issues, in this paper, we propose an end-to-end model FROG that better models the user preferences on potential friends. Comprehensive experiments on both offline evaluation and online deployment at Tencent have demonstrated the superiority of FROG over existing approaches.


Optimal Sequential Recommendations: Exploiting User and Item Structure

arXiv.org Machine Learning

Given the importance of these recommendation algorithms, it makes sense to try to design optimal ones. A basic criterion for optimality, that captures the first-order experience of users in a recommendation system, is to maximize the proportion of recommendations that are liked, 1 similar to [11, 23] The goal of this paper is to gain insight into the design of recommendation algorithms by finding a statistically optimal algorithm within the context of a natural model for recommendation systems. One of our findings is that the best way to obtain information about users and items in order to make good recommendations depends on the time horizon and its relation to various system parameters including the number of users, the diversity of users, and richness of the items; there are a number of operating regimes depending on these parameters. It goes without saying that the nature of any insight obtained is intertwined with the choice of model. We use the same model as [11], closely related to those studied in [10, 12]. The model is different from those in other papers on the topic; we now motivate its key features.


Combating the Bucket Effect:Multi-Knowledge Alignment for Medication Recommendation

arXiv.org Artificial Intelligence

Combating the Bucket Effect:Multi-Knowledge Alignment for Medication Recommendation Xiang Li a,, Haixu Ma a,, Guanyong Wu a, Shi Mu a, Chen Li a, and Shunpan Liang a,b, a School of Information Science and Engineering, Yanshan University, Qin Huangdao, 066004, China b Xinjiang College Of Science & Technology, Korla, 841000, ChinaA R T I C L E I N F OKeywords: Medication Recommendation Molecular Representation Learning A B S T R A C T Medication recommendation is crucial in healthcare, offering effective treatments based on patient's electronic health records (EHR). Previous studies show that integrating more medication-related knowledge improves medication representation accuracy. However, not all medications encompass multiple types of knowledge data simultaneously. For instance, some medications provide only textual descriptions without structured data. This imbalance in data availability limits the performance of existing models, a challenge we term the "bucket effect" in medication recommendation. To fill this gap, we introduce a cross-modal medication encoder capable of seamlessly aligning data from different modalities and propose a medication recommendation framework to integrate Multiple types of K nowledge, named MKMed. Then, we combine the multi-knowledge medication representations with patient records for recommendations. Extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate that MKMed mitigates the "bucket effect" in data, and significantly outperforms state-of-the-art baselines in recommendation accuracy and safety.1. Introduction Given the increasing demand for healthcare resources, there is a growing emphasis on AI-based medical systems. Medication recommendations Shang, Xiao, Ma, Li and Sun (2019); Wu, Qiu, Jiang, Qi and Wu (2022); Li, Liang, Hou and Ma (2024a), as a key area, aim to integrate clinical knowledge with patient electronic health records (EHR), enhancing the accuracy, safety, and efficiency of clinical decision-making for patients. Existing methods can be divided into two categories. The first category focuses on exploring the complex relationships between multiple medical events, optimizing patient representation by constructing complex networksLe, Tran and Venkatesh (2018); Jin, Yang, Sun, Liu, Qu and Tong (2018); Zheng, Wang, Xu, Shen, Qin, Huai, Liu and Chen (2021). For example, RAREMed Zhao, Jing, Feng, Wu, Gao and He (2024) focuses on the connections between rare events and others.


FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model

arXiv.org Artificial Intelligence

Fashion styling and personalized recommendations are pivotal in modern retail, contributing substantial economic value in the fashion industry. With the advent of vision-language models (VLM), new opportunities have emerged to enhance retailing through natural language and visual interactions. This work proposes FashionM3, a multimodal, multitask, and multiround fashion assistant, built upon a VLM fine-tuned for fashion-specific tasks. It helps users discover satisfying outfits by offering multiple capabilities including personalized recommendation, alternative suggestion, product image generation, and virtual try-on simulation. Fine-tuned on the novel FashionRec dataset, comprising 331,124 multimodal dialogue samples across basic, personalized, and alternative recommendation tasks, FashionM3 delivers contextually personalized suggestions with iterative refinement through multiround interactions. Quantitative and qualitative evaluations, alongside user studies, demonstrate FashionM3's superior performance in recommendation effectiveness and practical value as a fashion assistant.


How to manage Siri Suggestions on your iPhone

Popular Science

If you've been keeping up with the recent discussion around how best to use Signal and how to keep journalists out of your private chats about national security matters, you'll know that White House officials have been blaming an iPhone feature called Siri Suggestions for adding unauthorized members to a private group chat. Siri Suggestions works on iPhones (and iPads and Macs) to give you contextually aware assistance when you need it. The feature might make suggestions about who to invite to events based on previous events, for example, or give you prompts for searches on your device, based on what you've searched for at certain times in the past. In the case of the White House Signal chat blunders, it appears Siri made a contact suggestion based on details included in an email--only the contact details in the email weren't those of the email sender but those of someone mentioned in the message, which is where the confusion arose. As with many modern day AI tools, Siri Suggestions lets you choose where to draw the line on how much assistance to get.


Google is dropping support for its oldest Nest Learning Thermostats

PCWorld

Google just announced that it will soon drop support for the first- and second-generation Nest Learning Thermostats. The devices won't stop working completely, but remote access is going away, as are software updates and compatibility with the Google Home app. The older Nest Learning Thermostats that are losing support include the second-generation units for the U.S., released in 2014, as well as the European version of the second-gen thermostat, which also went on sale in 2014. The original Nest Learning Thermostat, which was released only in the U.S., landed in 2011. Google says it will drop support for the thermostats starting October 25, 2025. Besides no longer receiving software updates, the older Nest Leaning thermostats will lose Nest and Google Home app support, meaning no more out-of-home control.


A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions

arXiv.org Artificial Intelligence

Recent advances in Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users. Yet, fundamental questions about their capabilities, limitations, and paths forward remain open. This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence. To that end, we systematically analyze LLM-driven Conversational Agents by organizing their capabilities into three primary dimensions: (i) Reasoning - logical, systematic thinking inspired by human intelligence for decision making, (ii) Monitor - encompassing self-awareness and user interaction monitoring, and (iii) Control - focusing on tool utilization and policy following. Building upon this, we introduce a novel taxonomy by classifying recent work on Conversational Agents around our proposed desideratum. We identify critical research gaps and outline key directions, including realistic evaluations, long-term multi-turn reasoning skills, self-evolution capabilities, collaborative and multi-agent task completion, personalization, and proactivity. This work aims to provide a structured foundation, highlight existing limitations, and offer insights into potential future research directions for Conversational Agents, ultimately advancing progress toward Artificial General Intelligence (AGI). We maintain a curated repository of papers at: https://github.com/emrecanacikgoz/awesome-conversational-agents.


Fake paramedic jailed for Tinder date rapes

BBC News

The jury previously heard that when questioned by police, Kadolski said he would not be able to pin someone down as he had been "sexually abused as a child". He told officers when interviewed: "I'm not the best with empathy or sympathy." Michael Cohen, for the defendant, said "others are representing Mr Kadolski in an application for permission to appeal" against the convictions. The East of England Ambulance Service said Kadolski was "immediately" suspended when it was alerted of his arrest. A spokesperson said: "We are appalled at the crimes that Jamie Kadolski has been sentenced for today. "Our thoughts are with the victims and all those affected by these horrific crimes.


MMHCL: Multi-Modal Hypergraph Contrastive Learning for Recommendation

arXiv.org Artificial Intelligence

The burgeoning presence of multimodal content-sharing platforms propels the development of personalized recommender systems. Previous works usually suffer from data sparsity and cold-start problems, and may fail to adequately explore semantic user-product associations from multimodal data. To address these issues, we propose a novel Multi-Modal Hypergraph Contrastive Learning (MMHCL) framework for user recommendation. For a comprehensive information exploration from user-product relations, we construct two hypergraphs, i.e. a user-to-user (u2u) hypergraph and an item-to-item (i2i) hypergraph, to mine shared preferences among users and intricate multimodal semantic resemblance among items, respectively. This process yields denser second-order semantics that are fused with first-order user-item interaction as complementary to alleviate the data sparsity issue. Then, we design a contrastive feature enhancement paradigm by applying synergistic contrastive learning. By maximizing/minimizing the mutual information between second-order (e.g. shared preference pattern for users) and first-order (information of selected items for users) embeddings of the same/different users and items, the feature distinguishability can be effectively enhanced. Compared with using sparse primary user-item interaction only, our MMHCL obtains denser second-order hypergraphs and excavates more abundant shared attributes to explore the user-product associations, which to a certain extent alleviates the problems of data sparsity and cold-start. Extensive experiments have comprehensively demonstrated the effectiveness of our method. Our code is publicly available at: https://github.com/Xu107/MMHCL.