AITopics | Sun, Xu

Collaborating Authors

Sun, Xu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations

Liu, Fenglin, Liu, Yuanxin, Ren, Xuancheng, He, Xiaodong, Sun, Xu

Neural Information Processing SystemsMar-18-2020, 23:17:31 GMT

In vision-and-language grounding problems, fine-grained representations of the image are considered to be of paramount importance. Most of the current systems incorporate visual features and textual concepts as a sketch of an image. However, plainly inferred representations are usually undesirable in that they are composed of separate components, the relations of which are elusive. In this work, we aim at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics. To this end, we build the Mutual Iterative Attention (MIA) module, which integrates correlated visual features and textual concepts, respectively, by aligning the two modalities.

artificial intelligence, representation, textual concept, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Sensing and Signal Processing > Image Processing (0.45)

Add feedback

An Adaptive and Momental Bound Method for Stochastic Learning

Ding, Jianbang, Ren, Xuancheng, Luo, Ruixuan, Sun, Xu

arXiv.org Machine LearningOct-27-2019

Training deep neural networks requires intricate initialization and careful selection of learning rates. The emergence of stochastic gradient optimization methods that use adaptive learning rates based on squared past gradients, e.g., AdaGrad, AdaDelta, and Adam, eases the job slightly. However, such methods have also been proven problematic in recent studies with their own pitfalls including non-convergence issues and so on. Alternative variants have been proposed for enhancement, such as AMSGrad, AdaShift and AdaBound. In this work, we identify a new problem of adaptive learning rate methods that exhibits at the beginning of learning where Adam produces extremely large learning rates that inhibit the start of learning. We propose the Adaptive and Momental Bound (AdaMod) method to restrict the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks. Our experiments verify that AdaMod eliminates the extremely large learning rates throughout the training and brings significant improvements especially on complex networks such as DenseNet and Transformer, compared to Adam. Our implementation is available at: https://github.com/lancopku/AdaMod

adamod, computer based training, deep learning, (23 more...)

arXiv.org Machine Learning

1910.12249

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View

Chen, Deli, Lin, Yankai, Li, Wei, Li, Peng, Zhou, Jie, Sun, Xu

arXiv.org Machine LearningSep-7-2019

Graph Neural Networks (GNNs) have achieved promising performance on a wide range of graph-based tasks. Despite their success, one severe limitation of GNNs is the over-smoothing issue (indistinguishable representations of nodes in different classes). In this work, we present a systematic and quantitative study on the over-smoothing issue of GNNs. First, we introduce two quantitative metrics, MAD and MADGap, to measure the smoothness and over-smoothness of the graph nodes representations, respectively. Then, we verify that smoothing is the nature of GNNs and the critical factor leading to over-smoothness is the low information-to-noise ratio of the message received by the nodes, which is partially determined by the graph topology. Finally, we propose two methods to alleviate the over-smoothing issue from the topological view: (1) MADReg which adds a MADGap-based regularizer to the training objective; (2) AdaGraph which optimizes the graph topology based on the model predictions. Extensive experiments on 7 widely-used graph datasets with 10 typical GNN models show that the two proposed methods are effective for relieving the over-smoothing issue, thus improving the performance of various GNN models.

artificial intelligence, neural network, representation, (16 more...)

arXiv.org Machine Learning

1909.03211

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

A Hierarchical Reinforced Sequence Operation Method for Unsupervised Text Style Transfer

Wu, Chen, Ren, Xuancheng, Luo, Fuli, Sun, Xu

arXiv.org Artificial IntelligenceJun-5-2019

Unsupervised text style transfer aims to alter text styles while preserving the content, without aligned data for supervision. Existing seq2seq methods face three challenges: 1) the transfer is weakly interpretable, 2) generated outputs struggle in content preservation, and 3) the trade-off between content and style is intractable. To address these challenges, we propose a hierarchical reinforced sequence operation method, named Point-Then-Operate (PTO), which consists of a high-level agent that proposes operation positions and a low-level agent that alters the sentence. We provide comprehensive training objectives to control the fluency, style, and content of the outputs and a mask-based inference algorithm that allows for multi-step revision based on the single-step trained agents. Experimental results on two text style transfer datasets show that our method significantly outperforms recent methods and effectively addresses the aforementioned challenges.

deep learning, neural network, style transfer, (20 more...)

arXiv.org Artificial Intelligence

1906.01833

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
North America > Canada > Quebec (0.14)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

Memorized Sparse Backpropagation

Zhang, Zhiyuan, Yang, Pengcheng, Ren, Xuancheng, Sun, Xu

arXiv.org Machine LearningJun-1-2019

Neural network learning is typically slow since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing work in accelerating propagation through sparseness, the relevant theoretical characteristics remain unexplored and we empirically find that they suffer from the loss of information contained in unpropagated gradients. To tackle these problems, in this work, we present a unified sparse backpropagation framework and provide a detailed analysis of its theoretical characteristics. Analysis reveals that when applied to a multilayer perceptron, our framework essentially performs gradient descent using an estimated gradient similar enough to the true gradient, resulting in convergence in probability under certain conditions. Furthermore, a simple yet effective algorithm named memorized sparse backpropagation (MSBP) is proposed to remedy the problem of information loss by storing unpropagated gradients in memory for the next learning. The experiments demonstrate that the proposed MSBP is able to effectively alleviate the information loss in traditional sparse backpropagation while achieving comparable acceleration.

backpropagation, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1905.10194

Country:

Asia > Middle East > Qatar (0.14)
Oceania > Australia (0.14)
North America > United States (0.14)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.86)

Add feedback

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method

Sun, Xu, Ren, Xuancheng, Ma, Shuming, Wei, Bingzhen, Li, Wei, Xu, Jingjing, Wang, Houfeng, Zhang, Yi

arXiv.org Machine LearningMar-11-2019

We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-k elements (in terms of magnitude) are kept. As a result, only k rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of time we only need to update fewer than 5% of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy. The codes, including the extension, are available at https://github.com/lancopku/meSimp

deep learning, meprop, neural network, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/TKDE.2018.2883613

1711.06528

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Sun, Xu, Ren, Xuancheng, Ma, Shuming, Wang, Houfeng

arXiv.org Artificial IntelligenceMar-10-2019

We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp

deep learning, meprop, neural network, (18 more...)

arXiv.org Artificial Intelligence

1706.06197

Country:

Asia (0.14)
Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Luo, Liangchen, Xiong, Yuanhao, Liu, Yan, Sun, Xu

arXiv.org Machine LearningFeb-26-2019

Adaptive optimization methods such as AdaGrad, RMSprop and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods. In our paper, we demonstrate that extreme learning rates can lead to poor performance. We provide new variants of Adam and AMSGrad, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. We further conduct experiments on various popular tasks and models, which is often insufficient in previous work. Experimental results show that new variants can eliminate the generalization gap between adaptive methods and SGD and maintain higher learning speed early in training at the same time. Moreover, they can bring significant improvement over their prototypes, especially on complex deep networks. The implementation of the algorithm can be found at https://github.com/Luolc/AdaBound .

algorithm, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

1902.09843

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation

Lin, Junyang, Sun, Xu, Ren, Xuancheng, Li, Muyu, Su, Qi

arXiv.org Artificial IntelligenceAug-26-2018

Most of the Neural Machine Translation (NMT) models are based on the sequence-to-sequence (Seq2Seq) model with an encoder-decoder framework equipped with the attention mechanism. However, the conventional attention mechanism treats the decoding at each time step equally with the same matrix, which is problematic since the softness of the attention for different types of words (e.g. content words and function words) should differ. Therefore, we propose a new model with a mechanism called Self-Adaptive Control of Temperature (SACT) to control the softness of attention by means of an attention temperature. Experimental results on the Chinese-English translation and English-Vietnamese translation demonstrate that our model outperforms the baseline models, and the analysis and the case study show that our model can attend to the most relevant elements in the source-side contexts and generate the translation of high quality.

deep learning, mechanism, neural network, (20 more...)

arXiv.org Artificial Intelligence

1808.07374

Country: Asia > Vietnam (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Identifying High-Quality Chinese News Comments Based on Multi-Target Text Matching Model

Chen, Deli, Ma, Shuming, Yang, Pengcheng, Sun, Xu

arXiv.org Artificial IntelligenceAug-21-2018

With the development of information technology, there is an explosive growth in the number of online comment concerning news, blogs and so on. The massive comments are overloaded, and often contain some misleading and unwelcome information. Therefore, it is necessary to identify high-quality comments and filter out low-quality comments. In this work, we introduce a novel task: high-quality comment identification (HQCI), which aims to automatically assess the quality of online comments. First, we construct a news comment corpus, which consists of news, comments, and the corresponding quality label. Second, we analyze the dataset, and find the quality of comments can be measured in three aspects: informativeness, consistency, and novelty. Finally, we propose a novel multi-target text matching model, which can measure three aspects by referring to the news and surrounding comments. Experimental results show that our method can outperform various baselines by a large margin on the news dataset.

deep learning, neural network, representation, (20 more...)

arXiv.org Artificial Intelligence

1808.07191

Country:

Europe (0.93)
Asia (0.68)
North America > United States > Louisiana (0.14)
(2 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback