Goto

Collaborating Authors

 South America


FaceDig: Automated tool for placing landmarks on facial portraits for geometric morphometrics users

arXiv.org Artificial Intelligence

Landmark digitization is essential in geometric morphometrics, enabling the quantification of biological shapes, such as facial structures, for in-depth morphological analysis. Traditional landmarking, which identifies specific anatomical points, can be complemented by semilandmarks when precise locations are challenging to define. However, manual placement of numerous landmarks is time-consuming and prone to human error, leading to inconsistencies across studies. To address this, we introduce FaceDig, an AI-powered tool designed to automate landmark placement with human-level precision, focusing on anatomically sound facial points. FaceDig is open-source and integrates seamlessly with analytical platforms like R and Python. It was trained using one of the largest and most ethnically diverse face datasets, applying a landmark configuration optimized for 2D enface photographs. Our results demonstrate that FaceDig provides reliable landmark coordinates, comparable to those placed manually by experts. The tool's output is compatible with the widely-used TpsDig2 software, facilitating adoption and ensuring consistency across studies. Users are advised to work with standardized facial images and visually inspect the results for potential corrections. Despite the growing preference for 3D morphometrics, 2D facial photographs remain valuable due to their cultural and practical significance. Future enhancements to FaceDig will include support for profile views, further expanding its utility. By offering a standardized approach to landmark placement, FaceDig promotes reproducibility in facial morphology research and provides a robust alternative to existing 2D tools.


Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups

arXiv.org Artificial Intelligence

Complex Word Identification (CWI) is an essential step in the lexical simplification task and has recently become a task on its own. Some variations of this binary classification task have emerged, such as lexical complexity prediction (LCP) and complexity evaluation of multi-word expressions (MWE). Large language models (LLMs) recently became popular in the Natural Language Processing community because of their versatility and capability to solve unseen tasks in zero/few-shot settings. Our work investigates LLM usage, specifically open-source models such as Llama 2, Llama 3, and Vicuna v1.5, and closed-source, such as ChatGPT-3.5-turbo and GPT-4o, in the CWI, LCP, and MWE settings. We evaluate zero-shot, few-shot, and fine-tuning settings and show that LLMs struggle in certain conditions or achieve comparable results against existing methods. In addition, we provide some views on meta-learning combined with prompt learning. In the end, we conclude that the current state of LLMs cannot or barely outperform existing methods, which are usually much smaller.


Customized Subgraph Selection and Encoding for Drug-drug Interaction Prediction

arXiv.org Artificial Intelligence

Subgraph-based methods have proven to be effective and interpretable in predicting drug-drug interactions (DDIs), which are essential for medical practice and drug development. Subgraph selection and encoding are critical stages in these methods, yet customizing these components remains underexplored due to the high cost of manual adjustments. In this study, inspired by the success of neural architecture search (NAS), we propose a method to search for data-specific components within subgraph-based frameworks. Specifically, we introduce extensive subgraph selection and encoding spaces that account for the diverse contexts of drug interactions in DDI prediction. To address the challenge of large search spaces and high sampling costs, we design a relaxation mechanism that uses an approximation strategy to efficiently explore optimal subgraph configurations. This approach allows for robust exploration of the search space. Extensive experiments demonstrate the effectiveness and superiority of the proposed method, with the discovered subgraphs and encoding functions highlighting the model's adaptability.


Reconstructing MODIS Normalized Difference Snow Index Product on Greenland Ice Sheet Using Spatiotemporal Extreme Gradient Boosting Model

arXiv.org Artificial Intelligence

The spatiotemporally continuous data of normalized difference snow index (NDSI) are key to understanding the mechanisms of snow occurrence and development as well as the patterns of snow distribution changes. However, the presence of clouds, particularly prevalent in polar regions such as the Greenland Ice Sheet (GrIS), introduces a significant number of missing pixels in the MODIS NDSI daily data. To address this issue, this study proposes the utilization of a spatiotemporal extreme gradient boosting (STXGBoost) model generate a comprehensive NDSI dataset. In the proposed model, various input variables are carefully selected, encompassing terrain features, geometry-related parameters, and surface property variables. Moreover, the model incorporates spatiotemporal variation information, enhancing its capacity for reconstructing the NDSI dataset. Verification results demonstrate the efficacy of the STXGBoost model, with a coefficient of determination of 0.962, root mean square error of 0.030, mean absolute error of 0.011, and negligible bias (0.0001). Furthermore, simulation comparisons involving missing data and cross-validation with Landsat NDSI data illustrate the model's capability to accurately reconstruct the spatial distribution of NDSI data. Notably, the proposed model surpasses the performance of traditional machine learning models, showcasing superior NDSI predictive capabilities. This study highlights the potential of leveraging auxiliary data to reconstruct NDSI in GrIS, with implications for broader applications in other regions. The findings offer valuable insights for the reconstruction of NDSI remote sensing data, contributing to the further understanding of spatiotemporal dynamics in snow-covered regions.


Trustworthy Federated Learning: Privacy, Security, and Beyond

arXiv.org Artificial Intelligence

While recent years have witnessed the advancement in big data and Artificial Intelligence (AI), it is of much importance to safeguard data privacy and security. As an innovative approach, Federated Learning (FL) addresses these concerns by facilitating collaborative model training across distributed data sources without transferring raw data. However, the challenges of robust security and privacy across decentralized networks catch significant attention in dealing with the distributed data in FL. In this paper, we conduct an extensive survey of the security and privacy issues prevalent in FL, underscoring the vulnerability of communication links and the potential for cyber threats. We delve into various defensive strategies to mitigate these risks, explore the applications of FL across different sectors, and propose research directions. We identify the intricate security challenges that arise within the FL frameworks, aiming to contribute to the development of secure and efficient FL systems.


Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation

arXiv.org Artificial Intelligence

Medical video generation models are expected to have a profound impact on the healthcare industry, including but not limited to medical education and training, surgical planning, and simulation. Current video diffusion models typically build on image diffusion architecture by incorporating temporal operations (such as 3D convolution and temporal attention). Although this approach is effective, its oversimplification limits spatio-temporal performance and consumes substantial computational resources. To counter this, we propose Medical Simulation Video Generator (MedSora), which incorporates three key elements: i) a video diffusion framework integrates the advantages of attention and Mamba, balancing low computational load with high-quality video generation, ii) an optical flow representation alignment method that implicitly enhances attention to inter-frame pixels, and iii) a video variational autoencoder (VAE) with frequency compensation addresses the information loss of medical features that occurs when transforming pixel space into latent features and then back to pixel frames. Extensive experiments and applications demonstrate that MedSora exhibits superior visual quality in generating medical videos, outperforming the most advanced baseline methods. Further results and code are available at https://wongzbb.github.io/MedSora/.


Large-Scale Multi-Robot Coverage Path Planning on Grids with Path Deconfliction

arXiv.org Artificial Intelligence

Abstract--We study Multi-Robot Coverage Path Planning (MCPP) on a 4-neighbor 2D grid G, which aims to compute paths for multiple robots to cover all cells of G. Traditional approaches are limited as they first compute coverage trees on a quadrant coarsened grid H and then employ the Spanning Tree Coverage (STC) paradigm to generate paths on G, making them inapplicable to grids with partially obstructed 2 2 blocks. To address this limitation, we reformulate the problem directly on G, revolutionizing grid-based MCPP solving and establishing new NP-hardness results. We introduce Extended-STC (ESTC), a novel paradigm that extends STC to ensure complete coverage with bounded suboptimality, even when H includes partially obstructed blocks. These methods then apply the Spanning Tree Coverage (STC) [17] paradigm to generate coverage I. Coverage Path Planning (CPP) addresses the problem of determining However, operating exclusively on the coarsened grid H has a path that fully covers a designated workspace [1]. First, it fails in cases where H is This problem is essential for a broad spectrum of robotic incomplete--that is, when any 2 2 blocks contain obstructed applications, from indoor tasks like vacuum cleaning [2] and grid cells absent from G. Second, even optimal coverage trees inspection [3] to outdoor activities such as automated harvesting on H do not necessarily result in an optimal MCPP solution (as [4], planetary exploration [5], and environmental monitoring illustrated in Figure 1-(b) and (c)), as evidenced by an asymptotic [6]. Multi-Robot Coverage Path Planning (MCPP) is an suboptimality ratio of four for makespan minimization [14], extension of CPP tailored for multi-robot systems, aiming to since the paths derived from circumnavigating coverage trees coordinate the paths of multiple robots to collectively cover the of H constitute only a subset of all possible sets of coverage given workspace, thereby enhancing both task efficiency [7] The authors are with the School of Computing Science, Simon to discuss the structure and topology of G more precisely, especially in the Fraser University, Burnaby, BC V5A1S6, Canada. The robots require a cost of 1 to traverse between adjacent vertices of G. (a) Single-robot coverage path LS-MCPP but also those generated by existing MCPP methods, to effectively resolve conflicts between robots We revolutionize solving MCPP on grid graphs, overcoming and accounts for turning costs, further enhancing the the above limitations through a two-phase approach that first practicability of the solutions. Our algorithmic contribution are detailed in real-world robotics applications.


A Coverage-Guided Testing Framework for Quantum Neural Networks

arXiv.org Artificial Intelligence

Quantum Neural Networks (QNNs) combine quantum computing and neural networks, leveraging quantum properties such as superposition and entanglement to improve machine learning models. These quantum characteristics enable QNNs to potentially outperform classical neural networks in tasks such as quantum chemistry simulations, optimization problems, and quantum-enhanced machine learning. However, they also introduce significant challenges in verifying the correctness and reliability of QNNs. To address this, we propose QCov, a set of test coverage criteria specifically designed for QNNs to systematically evaluate QNN state exploration during testing, focusing on superposition and entanglement. These criteria help detect quantum-specific defects and anomalies. Quantum Neural Networks (QNNs) Cong et al. (2018) represent a significant advancement in computational technology, combining the principles of quantum mechanics with neural network mechanisms. By leveraging quantum properties such as superposition and entanglement, QNNs have the potential to solve complex problems more efficiently than classical neural networks, particularly in areas like image classification Li et al. (2022b); Shi et al. (2023); Henderson et al. (2019); Alam et al. (2021) and sequential data learning Bausch (2020); Yu et al. (2024). Despite this early success, similar to Deep Neural Networks (DNNs) LeCun et al. (1998a); He et al. (2015); Howard et al. (2017), QNNs have been shown to be vulnerable to adversarial Lu et al. (2019) and backdoor attacks Chu et al. (2023a;b), raising concerns about their security and robustness. A recent work, QuanTest Shi et al. (2024), introduced the first adversarial testing framework for QNNs, using an entanglement-guided optimization algorithm to generate adversarial inputs and capture erroneous behaviors. However, QuanTest focuses primarily on individual inputs, lacking a comprehensive evaluation of overall test adequacy for QNNs. Additionally, due to the complexity of the Hilbert space, which grows exponentially with the number of qubits, it is impractical to manually test QNNs thoroughly. This highlights the urgent need for a comprehensive testing framework to assess the test adequacy of QNNs. To ensure system quality, numerous testing techniques have been developed for deep learning (DL) systems Zhang et al. (2020); Wang et al. (2024) and traditional quantum software Wang et al. (2021a;b); Fortunato et al. (2022a); Xia et al. (2024) from various perspectives.


SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation

arXiv.org Artificial Intelligence

Spurred by the demand for interpretable models, research on eXplainable AI for language technologies has experienced significant growth, with feature attribution methods emerging as a cornerstone of this progress. While prior work in NLP explored such methods for classification tasks and textual applications, explainability intersecting generation and speech is lagging, with existing techniques failing to account for the autoregressive nature of state-of-the-art models and to provide fine-grained, phonetically meaningful explanations. We address this gap by introducing Spectrogram Perturbation for Explainable Speech-to-text Generation (SPES), a feature attribution technique applicable to sequence generation tasks with autoregressive models. SPES provides explanations for each predicted token based on both the input spectrogram and the previously generated tokens. Extensive evaluation on speech recognition and translation demonstrates that SPES generates explanations that are faithful and plausible to humans.


Data Augmentation for End-to-end Code-switching Speech Recognition

arXiv.org Artificial Intelligence

Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data augmentation. Specifically, they are audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion. Our experiments on 200 hours Mandarin-English code-switching dataset show that all the three proposed approaches yield significant improvements on code-switching ASR individually. Moreover, all the proposed approaches can be combined with recent popular SpecAugment, and an addition gain can be obtained. WER is significantly reduced by relative 24.0% compared to the system without any data augmentation, and still relative 13.0% gain compared to the system with only SpecAugment