Goto

Collaborating Authors

 Wang, Mei


Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants

arXiv.org Artificial Intelligence

Faces and humans are crucial elements in social interaction and are widely included in everyday photos and videos. Therefore, a deep understanding of faces and humans will enable multi-modal assistants to achieve improved response quality and broadened application scope. Currently, the multi-modal assistant community lacks a comprehensive and scientific evaluation of face and human understanding abilities. In this paper, we first propose a hierarchical ability taxonomy that includes three levels of abilities. Then, based on this taxonomy, we collect images and annotations from publicly available datasets in the face and human community and build a semi-automatic data pipeline to produce problems for the new benchmark. Finally, the obtained Face-Human-Bench comprises a development set with 900 problems and a test set with 1800 problems, supporting both English and Chinese. We conduct evaluations over 25 mainstream multi-modal large language models (MLLMs) with our Face-Human-Bench, focusing on the correlation between abilities, the impact of the relative position of targets on performance, and the impact of Chain of Thought (CoT) prompting on performance. Moreover, inspired by multi-modal agents, we also explore which abilities of MLLMs need to be supplemented by specialist models.


Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition

arXiv.org Artificial Intelligence

With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance the quality of facial depth images for low-quality 3D FR. After generating clean depth faces using DMDNet, we further design a powerful recognition network called Lightweight Depth and Normal Fusion network (LDNFNet), which incorporates a multi-branch fusion block to learn unique and complementary features between different modalities such as depth and normal images. Comprehensive experiments conducted on four distinct low-quality databases demonstrate the effectiveness and robustness of our proposed methods. Furthermore, when combining DMDNet and LDNFNet, we achieve state-of-the-art results on the Lock3DFace database.


Impact of Medical Data Imprecision on Learning Results

arXiv.org Artificial Intelligence

Test data measured by medical instruments often carry imprecise ranges that include the true values. The latter are not obtainable in virtually all cases. Most learning algorithms, however, carry out arithmetical calculations that are subject to uncertain influence in both the learning process to obtain models and applications of the learned models in, e.g. prediction. In this paper, we initiate a study on the impact of imprecision on prediction results in a healthcare application where a pre-trained model is used to predict future state of hyperthyroidism for patients. We formulate a model for data imprecisions. Using parameters to control the degree of imprecision, imprecise samples for comparison experiments can be generated using this model. Further, a group of measures are defined to evaluate the different impacts quantitatively. More specifically, the statistics to measure the inconsistent prediction for individual patients are defined. We perform experimental evaluations to compare prediction results based on the data from the original dataset and the corresponding ones generated from the proposed precision model using the long-short-term memories (LSTM) network. The results against a real world hyperthyroidism dataset provide insights into how small imprecisions can cause large ranges of predicted results, which could cause mis-labeling and inappropriate actions (treatments or no treatments) for individual patients.


Time-weighted Attentional Session-Aware Recommender System

arXiv.org Machine Learning

Session-based Recurrent Neural Networks (RNNs) are gaining increasing popularity for recommendation task, due to the high autocorrelation of user's behavior on the latest session and the effectiveness of RNN to capture the sequence order information. However, most existing session-based RNN recommender systems still solely focus on the short-term interactions within a single session and completely discard all the other long-term data across different sessions. While traditional Collaborative Filtering (CF) methods have many advanced research works on exploring long-term dependency, which show great value to be explored and exploited in deep learning models. Therefore, in this paper, we propose ASARS, a novel framework that effectively imports the temporal dynamics methodology in CF into session-based RNN system in DL, such that the temporal info can act as scalable weights by a parallel attentional network. Specifically, we first conduct an extensive data analysis to show the distribution and importance of such temporal interactions data both within sessions and across sessions. And then, our ASARS framework promotes two novel models: (1) an inter-session temporal dynamic model that captures the long-term user interaction for RNN recommender system. We integrate the time changes in session RNN and add user preferences as model drifting; and (2) a novel triangle parallel attention network that enhances the original RNN model by incorporating time information. Such triangle parallel network is also specially designed for realizing data argumentation in sequence-to-scalar RNN architecture, and thus it can be trained very efficiently. Our extensive experiments on four real datasets from different domains demonstrate the effectiveness and large improvement of ASARS for personalized recommendation.