Yang, Meng
Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos
Cui, Xiangxiang, Li, Zhongyu, Fan, Xiayue, Huang, Peng, Wang, Ying, Yang, Meng, Chang, Shi, Zhu, Jihua
The intersection of medical imaging and artificial intelligence has become an important research direction in intelligent medical treatment, particularly in the analysis of medical images using deep learning for clinical diagnosis. Despite the advances, existing keyframe classification methods lack extraction of time series features, while ultrasonic video classification based on three-dimensional convolution requires uniform frame numbers across patients, resulting in poor feature extraction efficiency and model classification performance. This study proposes a novel video classification method based on CNN and LSTM, introducing NLP's long and short sentence processing scheme into video classification for the first time. The method reduces CNN-extracted image features to 1x512 dimension, followed by sorting and compressing feature vectors for LSTM training. Specifically, feature vectors are sorted by patient video frame numbers and populated with padding value 0 to form variable batches, with invalid padding values compressed before LSTM training to conserve computing resources. Experimental results demonstrate that our variable-frame CNNLSTM method outperforms other approaches across all metrics, showing improvements of 3-6% in F1 score and 1.5% in specificity compared to keyframe methods. The variable-frame CNNLSTM also achieves better accuracy and precision than equal-frame CNNLSTM. These findings validate the effectiveness of our approach in classifying variable-frame ultrasound videos and suggest potential applications in other medical imaging modalities.
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook
Yang, Meng, Zhu, Tianqing, Liu, Chi, Zhou, WanLei, Yu, Shui, Yu, Philip S.
Thanks to the explosive growth of data and the development of computational resources, it is possible to build pre-trained models that can achieve outstanding performance on various tasks, such as neural language processing, computer vision, and more. Despite their powerful capabilities, pre-trained models have also sparked attention to the emerging security challenges associated with their real-world applications. Security and privacy issues, such as leaking privacy information and generating harmful responses, have seriously undermined users' confidence in these powerful models. Concerns are growing as model performance improves dramatically. Researchers are eager to explore the unique security and privacy issues that have emerged, their distinguishing factors, and how to defend against them. However, the current literature lacks a clear taxonomy of emerging attacks and defenses for pre-trained models, which hinders a high-level and comprehensive understanding of these questions. To fill the gap, we conduct a systematical survey on the security risks of pre-trained models, proposing a taxonomy of attack and defense methods based on the accessibility of pre-trained models' input and weights in various security test scenarios. This taxonomy categorizes attacks and defenses into No-Change, Input-Change, and Model-Change approaches. With the taxonomy analysis, we capture the unique security and privacy issues of pre-trained models, categorizing and summarizing existing security issues based on their characteristics. In addition, we offer a timely and comprehensive review of each category's strengths and limitations. Our survey concludes by highlighting potential new research opportunities in the security and privacy of pre-trained models.
Asymmetric Co-Training with Explainable Cell Graph Ensembling for Histopathological Image Classification
Yang, Ziqi, Li, Zhongyu, Liu, Chen, Luo, Xiangde, Wang, Xingguang, Xu, Dou, Li, Chaoqun, Qin, Xiaoying, Yang, Meng, Jin, Long
Convolutional neural networks excel in histopathological image classification, yet their pixel-level focus hampers explainability. Conversely, emerging graph convolutional networks spotlight cell-level features and medical implications. However, limited by their shallowness and suboptimal use of high-dimensional pixel data, GCNs underperform in multi-class histopathological image classification. To make full use of pixel-level and cell-level features dynamically, we propose an asymmetric co-training framework combining a deep graph convolutional network and a convolutional neural network for multi-class histopathological image classification. To improve the explainability of the entire framework by embedding morphological and topological distribution of cells, we build a 14-layer deep graph convolutional network to handle cell graph data. For the further utilization and dynamic interactions between pixel-level and cell-level information, we also design a co-training strategy to integrate the two asymmetric branches. Notably, we collect a private clinically acquired dataset termed LUAD7C, including seven subtypes of lung adenocarcinoma, which is rare and more challenging. We evaluated our approach on the private LUAD7C and public colorectal cancer datasets, showcasing its superior performance, explainability, and generalizability in multi-class histopathological image classification.
Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator
He, Ziwei, Yang, Meng, Feng, Minwei, Yin, Jingcheng, Wang, Xinbing, Leng, Jingwen, Lin, Zhouhan
The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Experiments show that our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA with significant improvement in both speed and space. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART and other efficient models. \footnote{Our code is publicly available at \url{https://github.com/LUMIA-Group/FourierTransformer}}
Is Writing Prompts Really Making Art?
McCormack, Jon, Gambardella, Camilo Cruz, Rajcic, Nina, Krol, Stephen James, Llano, Maria Teresa, Yang, Meng
In recent years Generative Machine Learning systems have advanced significantly. A current wave of generative systems use text prompts to create complex imagery, video, even 3D datasets. The creators of these systems claim a revolution in bringing creativity and art to anyone who can type a prompt. In this position paper, we question the basis for these claims, dividing our analysis into three areas: the limitations of linguistic descriptions, implications of the dataset, and lastly, matters of materiality and embodiment. We conclude with an analysis of the creative possibilities enabled by prompt-based systems, asking if they can be considered a new artistic medium.
Integrating Pre-trained Model into Rule-based Dialogue Management
Quan, Jun, Yang, Meng, Gan, Qiang, Xiong, Deyi, Liu, Yiming, Dong, Yuchen, Ouyang, Fangxin, Tian, Jun, Deng, Ruiling, Li, Yongzhi, Yang, Yang, Jiang, Daxin
Rule-based dialogue management is still the most popular solution for industrial task-oriented dialogue systems for their interpretablility. However, it is hard for developers to maintain the dialogue logic when the scenarios get more and more complex. On the other hand, data-driven dialogue systems, usually with end-to-end structures, are popular in academic research and easier to deal with complex conversations, but such methods require plenty of training data and the behaviors are less interpretable. In this paper, we propose a method to leverages the strength of both rule-based and data-driven dialogue managers (DM). We firstly introduce the DM of Carina Dialog System (CDS, an advanced industrial dialogue system built by Microsoft). Then we propose the "model-trigger" design to make the DM trainable thus scalable to scenario changes. Furthermore, we integrate pre-trained models and empower the DM with few-shot capability. The experimental results demonstrate the effectiveness and strong few-shot capability of our method.
Large-Margin Softmax Loss for Convolutional Neural Networks
Liu, Weiyang, Wen, Yandong, Yu, Zhiding, Yang, Meng
Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.
Discriminative Semi-Supervised Dictionary Learning with Entropy Regularization for Pattern Classification
Yang, Meng (Shenzhen University) | Chen, Lin (Shenzhen University)
Dictionary learning has played an important role in the success of sparse representation, which triggers the rapid developments of unsupervised and supervised dictionary learning methods. However, in most practical applications, there are usually quite limited labeled training samples while it is relatively easy to acquire abundant unlabeled training samples. Thus semi-supervised dictionary learning that aims to effectively explore the discrimination of unlabeled training data has attracted much attention of researchers. Although various regularizations have been introduced in the prevailing semi-supervised dictionary learning, how to design an effective unified model of dictionary learning and unlabeled-data class estimating and how to well explore the discrimination in the labeled and unlabeled data are still open. In this paper, we propose a novel discriminative semi-supervised dictionary learning model (DSSDL) by introducing discriminative representation, an identical coding of unlabeled data to the coding of testing data final classification, and an entropy regularization term. The coding strategy of unlabeled data can not only avoid the affect of its incorrect class estimation, but also make the learned discrimination be well exploited in the final classification. The introduced regularization of entropy can avoid overemphasizing on some uncertain estimated classes for unlabeled samples. Apart from the enhanced discrimination in the learned dictionary by the discriminative representation, an extended dictionary is used to mainly explore the discrimination embedded in the unlabeled data. Extensive experiments on face recognition, digit recognition and texture classification show the effectiveness of the proposed method.
An MM Algorithm for Split Feasibility Problems
Xu, Jason, Chi, Eric C., Yang, Meng, Lange, Kenneth
The classical multi-set split feasibility problem seeks a point in the intersection of finitely many closed convex domain constraints, whose image under a linear mapping also lies in the intersection of finitely many closed convex range constraints. Split feasibility generalizes important inverse problems including convex feasibility, linear complementarity, and regression with constraint sets. When a feasible point does not exist, solution methods that proceed by minimizing a proximity function can be used to obtain optimal approximate solutions to the problem. We present an extension of the proximity function approach that generalizes the linear split feasibility problem to allow for non-linear mappings. Our algorithm is based on the principle of majorization-minimization, is amenable to quasi-Newton acceleration, and comes complete with convergence guarantees under mild assumptions. Furthermore, we show that the Euclidean norm appearing in the proximity function of the non-linear split feasibility problem can be replaced by arbitrary Bregman divergences. We explore several examples illustrating the merits of non-linear formulations over the linear case, with a focus on optimization for intensity-modulated radiation therapy.
Robust Elastic Net Regression
Liu, Weiyang, Lin, Rongmei, Yang, Meng
We propose a robust elastic net (REN) model for high-dimensional sparse regression and give its performance guarantees (both the statistical error bound and the optimization bound). A simple idea of trimming the inner product is applied to the elastic net model. Specifically, we robustify the covariance matrix by trimming the inner product based on the intuition that the trimmed inner product can not be significant affected by a bounded number of arbitrarily corrupted points (outliers). The REN model can also derive two interesting special cases: robust Lasso and robust soft thresholding. Comprehensive experimental results show that the robustness of the proposed model consistently outperforms the original elastic net and matches the performance guarantees nicely.