Chen, Guangyong
Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation
Cao, Yong, Li, Wei, Li, Xianzhi, Chen, Min, Chen, Guangyong, Hu, Long, Li, Zhengdao, Kai, Hwang
Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos and then employs a translation module to translate glosses into spoken sentences. Most existing works focus on the recognition step, while paying less attention to sign language translation. In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation, by introducing the instruction module and the learning-based feature fuse strategy into a Transformer network. In this way, the pre-trained model's language ability can be well explored and utilized to further boost the translation performance. Moreover, by exploring the representation space of sign language glosses and target spoken language, we propose a multi-level data augmentation scheme to adjust the data distribution of the training set. We conduct extensive experiments on two challenging benchmark datasets, PHOENIX-2014-T and ASLG-PC12, on which our method outperforms former best solutions by 1.65 and 1.42 in terms of BLEU-4. Our code is published at https://github.com/yongcaoplus/TIN-SLT.
RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction
Zhou, Donghao, Gu, Chunbin, Xu, Junde, Liu, Furui, Wang, Qiong, Chen, Guangyong, Heng, Pheng-Ann
In biological research, fluorescence staining is a key technique to reveal the locations and morphology of subcellular structures. However, it is slow, expensive, and harmful to cells. In this paper, we model it as a deep learning task termed subcellular structure prediction (SSP), aiming to predict the 3D fluorescent images of multiple subcellular structures from a 3D transmitted-light image. Unfortunately, due to the limitations of current biotechnology, each image is partially labeled in SSP. Besides, naturally, subcellular structures vary considerably in size, which causes the multi-scale issue of SSP. To overcome these challenges, we propose Re-parameterizing Mixture-of-Diverse-Experts (RepMode), a network that dynamically organizes its parameters with task-aware priors to handle specified single-label prediction tasks. In RepMode, the Mixture-of-Diverse-Experts (MoDE) block is designed to learn the generalized parameters for all tasks, and gating re-parameterization (GatRep) is performed to generate the specialized parameters for each task, by which RepMode can maintain a compact practical topology exactly like a plain network, and meanwhile achieves a powerful theoretical topology. Comprehensive experiments show that RepMode can achieve state-of-the-art overall performance in SSP.
DPPMask: Masked Image Modeling with Determinantal Point Processes
Xu, Junde, Lin, Zikai, Zhou, Donghao, Yang, Yaodong, Liao, Xiangyun, Wu, Bian, Chen, Guangyong, Heng, Pheng-Ann
Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images. Despite the empirical success, most previous works have neglected the important fact that it is unreasonable to force the model to reconstruct something beyond recovery, such as those masked objects. In this work, we show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information, resulting in a misalignment problem and hurting the representative learning eventually. To address this issue, we augment MIM with a new masking strategy namely the DPPMask by substituting the random process with Determinantal Point Process (DPPs) to reduce the semantic change of the image after masking. Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks. In particular, we evaluate our method on two representative MIM frameworks, MAE and iBOT. We show that DPPMask surpassed random sampling under both lower and higher masking ratios, indicating that DPPMask makes the reconstruction task more reasonable. We further test our method on the background challenge and multi-class classification tasks, showing that our method is more robust at various tasks.
DR-Label: Improving GNN Models for Catalysis Systems by Label Deconstruction and Reconstruction
Wang, Bowen, Liang, Chen, Wang, Jiaze, Liu, Furui, Hao, Shaogang, Li, Dong, Hao, Jianye, Chen, Guangyong, Zou, Xiaolong, Heng, Pheng-Ann
Attaining the equilibrium state of a catalyst-adsorbate system is key to fundamentally assessing its effective properties, such as adsorption energy. Machine learning methods with finer supervision strategies have been applied to boost and guide the relaxation process of an atomic system and better predict its properties at the equilibrium state. In this paper, we present a novel graph neural network (GNN) supervision and prediction strategy DR-Label. The method enhances the supervision signal, reduces the multiplicity of solutions in edge representation, and encourages the model to provide node predictions that are graph structural variation robust. DR-Label first Deconstructs finer-grained equilibrium state information to the model by projecting the node-level supervision signal to each edge. Reversely, the model Reconstructs a more robust equilibrium state prediction by transforming edge-level predictions to node-level with a sphere-fitting algorithm. The DR-Label strategy was applied to three radically distinct models, each of which displayed consistent performance enhancements. Based on the DR-Label strategy, we further proposed DRFormer, which achieved a new state-of-the-art performance on the Open Catalyst 2020 (OC20) dataset and the Cu-based single-atom-alloyed CO adsorption (SAA) dataset. We expect that our work will highlight crucial steps for the development of a more accurate model in equilibrium state property prediction of a catalysis system.
G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks
Wan, Zhongwei, Yin, Yichun, Zhang, Wei, Shi, Jiaxin, Shang, Lifeng, Chen, Guangyong, Jiang, Xin, Liu, Qun
Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.
PMD: A New User Distance for Recommender Systems
Meng, Yitong, Liu, Weiwen, Liao, Benben, Guo, Jun, Chen, Guangyong
Collaborative filtering, a widely-used recommendation technique, predicts a user's preference by aggregating the ratings from similar users. As a result, these measures cannot fully utilize the rating information and are not suitable for real world sparse data. To solve these issues, we propose a novel user distance measure named Preference Mover's Distance (PMD) which makes full use of all ratings made by each user. Our proposed PMD can properly measure the distance between a pair of users even if they have no co-rated items. We show that this measure can be cast as an instance of the Earth Mover's Distance, a well-studied transportation problem for which several highly efficient solvers have been developed. Experimental results show that PMD can help achieve superior recommendation accuracy than state-of-the-art methods, especially when training data is very sparse.
Wasserstein Collaborative Filtering for Item Cold-start Recommendation
Meng, Yitong, Chen, Guangyong, Liao, Benben, Guo, Jun, Liu, Weiwen
Although numerous instantiations [ He et al., 2017; Liang et al., 2018 ] of CF have been proposed in recent years, matrix factorization (MF) [ Mnih and Salakhut-dinov, 2007; Koren et al., 2009 ] remains the most popular one due to its simplicity and effectiveness, and has been used for large scale recommendations of news [ Das et al., 2007], movies [ Koren et al., 2009 ] and products [ Linden et al., 2003 ] . Recent studies extend the MF framework for item cold-start recommendation by incorporating content information of items. The majority of methods for item cold-start recommendation employ a latent space sharing model. For example, Saveski te al. [ 2014] and Barjasteh et al. [ 2016 ] propose to use MF as the prjection function for both interactions and item contents. LDA [ Wang and Blei, 2011 ], CNN [ Kim et al., 2016 ], DNN [ Ebesu and Fang, 2017 ], SDAE [ Wang et al., 2015; Ying et al., 2016 ] and mDA [ Li et al., 2015 ] are proposed to learn the latent vectors of items from their textual contents. V an den Oord et al. [ 2013] and Wang et al. [ 2014] propose to use CNN to learn the latent vectors of music from their audio signals. The Wasserstein distance, which originates from optimal transport theory [ Rubner et al., 1998; Levina and Bickel, 2001], is a distance metric on probabilistic space and able to leverage the information on feature space. It has been successfully applied to many applications, such as computer vision [ Arjovsky et al., 2017 ] and natural language processing Figure 2: An illustration of problem definition.
Spectral-based Graph Convolutional Network for Directed Graphs
Ma, Yi, Hao, Jianye, Yang, Yaodong, Li, Han, Jin, Junqi, Chen, Guangyong
Graph convolutional networks(GCNs) have become the most popular approaches for graph data in these days because of their powerful ability to extract features from graph. GCNs approaches are divided into two categories, spectral-based and spatial-based. As the earliest convolutional networks for graph data, spectral-based GCNs have achieved impressive results in many graph related analytics tasks. However, spectral-based models cannot directly work on directed graphs. In this paper, we propose an improved spectral-based GCN for the directed graph by leveraging redefined Laplacians to improve its propagation model. Our approach can work directly on directed graph data in semi-supervised nodes classification tasks. Experiments on a number of directed graph datasets demonstrate that our approach outperforms the state-of-the-art methods.
Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models
Chen, Guangyong, Chen, Pengfei, Hsieh, Chang-Yu, Lee, Chee-Kong, Liao, Benben, Liao, Renjie, Liu, Weiwen, Qiu, Jiezhong, Sun, Qiming, Tang, Jie, Zemel, Richard, Zhang, Shengyu
We introduce a new molecular dataset, named Alchemy, for developing machine learning models useful in chemistry and material science. As of June 20th 2019, the dataset comprises of 12 quantum mechanical properties of 119,487 organic molecules with up to 14 heavy atoms, sampled from the GDB MedChem database. The Alchemy dataset expands the volume and diversity of existing molecular datasets. Our extensive benchmarks of the state-of-the-art graph neural network models on Alchemy clearly manifest the usefulness of new data in validating and developing machine learning models for chemistry and material science. We further launch a contest to attract attentions from researchers in the related fields. More details can be found on the contest website \footnote{https://alchemy.tencent.com}. At the time of benchamrking experiment, we have generated 119,487 molecules in our Alchemy dataset. More molecular samples are generated since then. Hence, we provide a list of molecules used in the reported benchmarks.
A Meta Approach to Defend Noisy Labels by the Manifold Regularizer PSDR
Chen, Pengfei, Liao, Benben, Chen, Guangyong, Zhang, Shengyu
Noisy labels are ubiquitous in real-world datasets, which poses a challenge for robustly training deep neural networks (DNNs) since DNNs can easily overfit to the noisy labels. Most recent efforts have been devoted to defending noisy labels by discarding noisy samples from the training set or assigning weights to training samples, where the weight associated with a noisy sample is expected to be small. Thereby, these previous efforts result in a waste of samples, especially those assigned with small weights. The input $x$ is always useful regardless of whether its observed label $y$ is clean. To make full use of all samples, we introduce a manifold regularizer, named as Paired Softmax Divergence Regularization (PSDR), to penalize the Kullback-Leibler (KL) divergence between softmax outputs of similar inputs. In particular, similar inputs can be effectively generated by data augmentation. PSDR can be easily implemented on any type of DNNs to improve the robustness against noisy labels. As empirically demonstrated on benchmark datasets, our PSDR impressively improve state-of-the-art results by a significant margin.