Goto

Collaborating Authors

 Li, Xiaolin


DCentNet: Decentralized Multistage Biomedical Signal Classification using Early Exits

arXiv.org Artificial Intelligence

This paper presents DCentNet, a novel decentralized multistage signal classification approach for biomedical data obtained from Internet of Things (IoT) wearable sensors, utilizing early exit point (EEP) to improve both energy e fficiency and processing speed. Traditionally, IoT sensor data is processed in a centralized manner on a single node, Cloud-native or Edge-native, which comes with several restrictions, such as significant energy consumption on the edge sensor and greater latency. To address these limitations, we propose DCentNet, a decentralized method based on Convolutional Neural Network (CNN) classifiers, where a single CNN model is partitioned into several sub-networks using one or more EEPs. Our method introduces encoder-decoder pairs at EEPs, which serve to compress large feature maps before transferring them to the next sub-network, drastically reducing wireless data transmission and power consumption. When the input can be confidently classified at an EEP, the processing can terminate early without traversing the entire network. To minimize sensor energy consumption and overall complexity, the initial sub-networks can be set up in the fog or on the edge. We also explore di fferent EEP locations and demonstrate that the choice of EEP can be altered to achieve a trade-o ff between performance and complexity by employing a genetic algorithm approach. DCentNet addresses the limitations of centralized processing in IoT wearable sensor data analysis, o ff ering improved e fficiency and performance. The experimental results of electrocardiogram (ECG) classification validate the success of our proposed method. With one EEP, the system saves 94.54% of wireless data transmission and a corresponding 21% decrease in complexity, while the classification accuracy and sensitivity remain almost una ffected and stay at their original levels. When employing two EEPs, the system demonstrates a sensitivity of 98.36% and an accuracy of 97.74%, concurrently leading to a 91.86% reduction in wireless data transmission and a reduction in complexity by 22%. DCentNet is implemented on an ARM Cortex-M4 based microcontroller unit (MCU).


EquiBoost: An Equivariant Boosting Approach to Molecular Conformation Generation

arXiv.org Artificial Intelligence

Molecular conformation generation plays key roles in computational drug design. Recently developed deep learning methods, particularly diffusion models have reached competitive performance over traditional cheminformatical approaches. However, these methods are often time-consuming or require extra support from traditional methods. We propose EquiBoost, a boosting model that stacks several equivariant graph transformers as weak learners, to iteratively refine 3D conformations of molecules. Without relying on diffusion techniques, EquiBoost balances accuracy and efficiency more effectively than diffusion-based methods. Notably, compared to the previous state-of-the-art diffusion method, EquiBoost improves generation quality and preserves diversity, achieving considerably better precision of Average Minimum RMSD (AMR) on the GEOM datasets. This work rejuvenates boosting and sheds light on its potential to be a robust alternative to diffusion models in certain scenarios.


EquiFlow: Equivariant Conditional Flow Matching with Optimal Transport for 3D Molecular Conformation Prediction

arXiv.org Artificial Intelligence

Molecular 3D conformations play a key role in determining how molecules interact with other molecules or protein surfaces. Recent deep learning advancements have improved conformation prediction, but slow training speeds and difficulties in utilizing high-degree features limit performance. We propose EquiFlow, an equivariant conditional flow matching model with optimal transport. EquiFlow uniquely applies conditional flow matching in molecular 3D conformation prediction, leveraging simulation-free training to address slow training speeds. It uses a modified Equiformer model to encode Cartesian molecular conformations along with their atomic and bond properties into higher-degree embeddings. Additionally, EquiFlow employs an ODE solver, providing faster inference speeds compared to diffusion models with SDEs. Experiments on the QM9 dataset show that EquiFlow predicts small molecule conformations more accurately than current state-of-the-art models.


BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

arXiv.org Artificial Intelligence

Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets, which are time-consuming, computationally expensive, and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.


Federated Unsupervised Representation Learning

arXiv.org Artificial Intelligence

To leverage enormous unlabeled data on distributed edge devices, we formulate a new problem in federated learning called Federated Unsupervised Representation Learning (FURL) to learn a common representation model without supervision while preserving data privacy. FURL poses two new challenges: (1) data distribution shift (Non-IID distribution) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces. (2) without the unified information among clients in FURL, the representations across clients would be misaligned. To address these challenges, we propose Federated Constrastive Averaging with dictionary and alignment (FedCA) algorithm. FedCA is composed of two key modules: (1) dictionary module to aggregate the representations of samples from each client and share with all clients for consistency of representation space and (2) alignment module to align the representation of each client on a base model trained on a public data. We adopt the contrastive loss for local model training. Through extensive experiments with three evaluation protocols in IID and Non-IID settings, we demonstrate that FedCA outperforms all baselines with significant margins.


Distilled One-Shot Federated Learning

arXiv.org Artificial Intelligence

Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning, which reduces the number of communication rounds required to train a performant model to only one. Each client distills their private dataset and sends the synthetic data (e.g. images or sentences) to the server. The distilled data look like noise and become useless after model fitting. We empirically show that, in only one round of communication, our method can achieve 96% test accuracy on federated MNIST with LeNet (centralized 99%), 81% on federated IMDB with a customized CNN (centralized 86%), and 84% on federated TREC-6 with a Bi-LSTM (centralized 89%). Using only a few rounds, DOSFL can match the centralized baseline on all three tasks. By evading the need for model-wise updates (i.e., weights, gradients, loss, etc.), the total communication cost of DOSFL is reduced by over an order of magnitude. We believe that DOSFL represents a new direction orthogonal to previous work, towards weight-less and gradient-less federated learning.


PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders

arXiv.org Machine Learning

Although substantial efforts have been made to learn disentangled representations under the variational autoencoder (VAE) framework, the fundamental properties to the dynamics of learning of most VAE models still remain unknown and under-investigated. We then present an information-theoretic perspective to analyze existing VAE models by inspecting the evolution of some critical information-theoretic quantities across training epochs. Our observations unveil some fundamental properties associated with VAEs. Empirical results also demonstrate the effectiveness of PRI-VAE on four benchmark data sets. A central goal for representation learning models is that the resulting latent representation should be compact yet disentangled. Compact requires the representation z does not contain any nuance factors in the input signal x that are not relevant for the desired response y [1], whereas disentangled means that z is factorizable and has consistent semantics associated to different generating factors of the underlying data generation process. Yanjun Li, Jose C. Principe and Dapeng Wu are with the NSF Center for Big Learning, University of Florida, U.S.A (email: yanjun.li@ufl.edu, Shujian Yu is with the Machine Learning Group, NEC Laboratories Europe, Germany (email: Shujian.Yu@neclab.eu). To whom correspondence should be addressed.


Improving Question Generation with Sentence-level Semantic Matching and Answer Position Inferring

arXiv.org Artificial Intelligence

Taking an answer and its context as input, sequence-to- sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two concrete modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOT A) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly.


Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

arXiv.org Machine Learning

Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In general, we show that mean and standard deviation are not always the most appropriate choice for the centering and scaling procedure within the BN transformation, particularly if ReLU follows the normalization step. We present a Generalized Batch Normalization (GBN) transformation, which can utilize a variety of alternative deviation measures for scaling and statistics for centering, choices which naturally arise from the theory of generalized deviation measures and risk theory in general. When used in conjunction with the ReLU non-linearity, the underlying risk theory suggests natural, arguably optimal choices for the deviation measure and statistic. Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well. Overall, we propose a more flexible BN transformation supported by a complimentary theoretical framework that can potentially guide design choices.


FoldingZero: Protein Folding from Scratch in Hydrophobic-Polar Model

arXiv.org Artificial Intelligence

De novo protein structure prediction from amino acid sequence is one of the most challenging problems in computational biology. As one of the extensively explored mathematical models for protein folding, Hydrophobic-Polar (HP) model enables thorough investigation of protein structure formation and evolution. Although HP model discretizes the conformational space and simplifies the folding energy function, it has been proven to be an NP-complete problem. In this paper, we propose a novel protein folding framework FoldingZero, self-folding a de novo protein 2D HP structure from scratch based on deep reinforcement learning. FoldingZero features the coupled approach of a two-head (policy and value heads) deep convolutional neural network (HPNet) and a regularized Upper Confidence Bounds for Trees (R-UCT). It is trained solely by a reinforcement learning algorithm, which improves HPNet and R-UCT iteratively through iterative policy optimization. Without any supervision and domain knowledge, FoldingZero not only achieves comparable results, but also learns the latent folding knowledge to stabilize the structure. Without exponential computation, FoldingZero shows promising potential to be adopted for real-world protein properties prediction.