AITopics | University of Science and Technology of China

Collaborating Authors

University of Science and Technology of China

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks

Xu, Linli (University of Science and Technology of China) | Jiang, Liang ( University of Science and Technology of China ) | Qin, Chuan (University of Science and Technology of China) | Wang, Zhe (Ant Financial Services Group) | Du, Dongfang (University of Science and Technology of China)

AAAI ConferencesFeb-8-2018

With the recent advances of neural models and natural language processing, automatic generation of classical Chinese poetry has drawn significant attention due to its artistic and cultural value. Previous works mainly focus on generating poetry given keywords or other text information, while visual inspirations for poetry have been rarely explored. Generating poetry from images is much more challenging than generating poetry from text, since images contain very rich visual information which cannot be described completely using several keywords, and a good poem should convey the image accurately. In this paper, we propose a memory based neural model which exploits images to generate poems. Specifically, an Encoder-Decoder model with a topic memory network is proposed to generate classical Chinese poetry from images. To the best of our knowledge, this is the first work attempting to generate classical Chinese poetry from images with neural networks. A comprehensive experimental investigation with both human evaluation and quantitative analysis demonstrates that the proposed model can generate poems which convey images accurately.

deep learning, keyword, neural network, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Video-Based Sign Language Recognition Without Temporal Segmentation

Huang, Jie (University of Science and Technology of China) | Zhou, Wengang ( University of Science and Technology of China ) | Zhang, Qilin (HERE Technologies, Chicago, Illinois) | Li, Houqiang ( University of Science and Technology of China ) | Li, Weiping ( University of Science and Technology of China )

AAAI ConferencesFeb-8-2018

Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.

deep learning, language learning, recognition, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sequence-to-Sequence Learning via Shared Latent Representation

Shen, Xu (University of Science and Technology of China) | Tian, Xinmei (University of Science and Technology of China) | Xing, Jun (University of Southern California) | Rui, Yong (Lenovo Research) | Tao, Dacheng (University of Sydney)

AAAI ConferencesFeb-8-2018

Sequence-to-sequence learning is a popular research area in deep learning, such as video captioning and speech recognition. Existing methods model this learning as a mapping process by first encoding the input sequence to a fixed-sized vector, followed by decoding the target sequence from the vector. Although simple and intuitive, such mapping model is task-specific, unable to be directly used for different tasks. In this paper, we propose a star-like framework for general and flexible sequence-to-sequence learning, where different types of media contents (the peripheral nodes) could be encoded to and decoded from a shared latent representation (SLR) (the central node). This is inspired by the fact that human brain could learn and express an abstract concept in different ways. The media-invariant property of SLR could be seen as a high-level regularization on the intermediate vector, enforcing it to not only capture the latent representation intra each individual media like the auto-encoders, but also their transitions like the mapping models. Moreover, the SLR model is content-specific, which means it only needs to be trained once for a dataset, while used for different tasks. We show how to train a SLR model via dropout and use it for different sequence-to-sequence tasks. Our SLR model is validated on the Youtube2Text and MSR-VTT datasets, achieving superior performance on video-to-sentence task, and the first sentence-to-video results.

deep learning, neural network, video, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Oceania > Australia (0.14)
North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization

Wang, Yijun (University of Science and Technology of China) | Xia, Yingce (University of Science and Technology of China) | Zhao, Li (Microsoft Research Asia) | Bian, Jiang (Microsoft Research Asia) | Qin, Tao (Microsoft Research Asia) | Liu, Guiquan (University of Science and Technology of China) | Liu, Tie-Yan (Microsoft Research Asia)

AAAI ConferencesFeb-8-2018

Neural machine translation (NMT) heavily relies on parallel bilingual data for training. Since large-scale, high-quality parallel corpora are usually costly to collect, it is appealing to exploit monolingual corpora to improve NMT. Inspired by the law of total probability, which connects the probability of a given target-side monolingual sentence to the conditional probability of translating from a source sentence to the target one, we propose to explicitly exploit this connection to learn from and regularize the training of NMT models using monolingual data. The key technical challenge of this approach is that there are exponentially many source sentences for a target monolingual sentence while computing the sum of the conditional probability given each possible source sentence. We address this challenge by leveraging the dual translation model (target-to-source translation) to sample several mostly likely source-side sentences and avoid enumerating all possible candidate source sentences. That is, we transfer the knowledge contained in the dual model to boost the training of the primal model (source-to-target translation), and we call such an approach dual transfer learning. Experiment results on English-French and German-English tasks demonstrate that dual transfer learning achieves significant improvement over several strong baselines and obtains new state-of-the-art results.

deep learning, neural network, translation, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.14)
Asia > Vietnam (0.14)

Add feedback

Multi-Scale Face Restoration With Sequential Gating Ensemble Network

Lin, Jianxin (University of Science and Technology of China) | Zhou, Tiankuang (University of Science and Technology of China) | Chen, Zhibo (University of Science and Technology of China)

AAAI ConferencesFeb-8-2018

Restoring face images from distortions is important in face recognition applications and is challenged by multiple scale issues, which is still not well-solved in research area. In this paper, we present a Sequential Gating Ensemble Network (SGEN) for multi-scale face restoration issue. We first employ the principle of ensemble learning into SGEN architecture design to reinforce predictive performance of the network. The SGEN aggregates multi-level base-encoders and base-decoders into the network, which enables the network to contain multiple scales of receptive field. Instead of combining these base-en/decoders directly with non-sequential operations, the SGEN takes base-en/decoders from different levels as sequential data. Specifically, the SGEN learns to sequentially extract high level information from base-encoders in bottom-up manner and restore low level information from base-decoders in top-down manner. Besides, we propose to realize bottom-up and top-down information combination and selection with Sequential Gating Unit (SGU). The SGU sequentially takes two inputs from different levels and decides the output based on one active input. Experiment results demonstrate that our SGEN is more effective at multi-scale human face restoration with more image details and less noise than state-of-the-art image restoration models. By using adversarial training, SGEN also produces more visually preferred results than other models through subjective evaluation.

deep learning, neural network, sgen, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.90)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Measuring the Popularity of Job Skills in Recruitment Market: A Multi-Criteria Approach

Xu, Tong (University of Science and Technology of China) | Zhu, Hengshu (Baidu Talent Intelligence Center) | Zhu, Chen (Baidu Talent Intelligence Center, Baidu Inc.) | Li, Pan (Baidu Talent Intelligence Center, Baidu Inc.) | Xiong, Hui (University of Science and Technology of China)

AAAI ConferencesFeb-8-2018

To cope with the accelerating pace of technological changes, talents are urged to add and refresh their skills for staying in active and gainful employment. This raises a natural question: what are the right skills to learn? Indeed, it is a nontrivial task to measure the popularity of job skills due to the diversified criteria of jobs and the complicated connections within job skills. To that end, in this paper, we propose a data driven approach for modeling the popularity of job skills based on the analysis of large-scale recruitment data. Specifically, we first build a job skill network by exploring a large corpus of job postings. Then, we develop a novel Skill Popularity based Topic Model (SPTM) for modeling the generation of the skill network. In particular, SPTM can integrate different criteria of jobs (e.g., salary levels, company size) as well as the latent connections within skills, thus we can effectively rank the job skills based on their multi-faceted popularity. Extensive experiments on real-world recruitment data validate the effectiveness of SPTM for measuring the popularity of job skills, and also reveal some interesting rules, such as the popular job skills which lead to high-paid employment.

artificial intelligence, job skills, natural language, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.28)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.36)

Add feedback

Joint Training for Neural Machine Translation Models with Monolingual Data

Zhang, Zhirui (University of Science and Technology of China) | Liu, Shujie (Microsoft Research) | Li, Mu (Microsoft Research) | Zhou, Ming (Microsoft Research) | Chen, Enhong (University of Science and Technology of China)

AAAI ConferencesFeb-8-2018

Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation tasks where parallel data are not rich enough. In this paper, we propose a novel approach to better leveraging monolingual data for neural machine translation by jointly learning source-to-target and target-to-source NMT models for a language pair with a joint EM optimization method. The training process starts with two initial NMT models pre-trained on parallel data for each direction, and these two models are iteratively updated by incrementally decreasing translation losses on training data.In each iteration step, both NMT models are first used to translate monolingual data from one language to the other, forming pseudo-training data of the other NMT model. Then two new NMT models are learnt from parallel data together with the pseudo training data. Both NMT models are expected to be improved and better pseudo-training data can be generated in next step. Experiment results on Chinese-English and English-German translation tasks show that our approach can simultaneously improve translation quality of source-to-target and target-to-source models, significantly outperforming strong baseline systems which are enhanced with monolingual data for model training including back-translation.

deep learning, monolingual data, neural network, (22 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Multiset Selection With Size Constraints

Qian, Chao (University of Science and Technology of China) | Zhang, Yibo (University of Science and Technology of China) | Tang, Ke (Southern University of Science and Technology) | Yao, Xin (Southern University of Science and Technology)

AAAI ConferencesFeb-8-2018

This paper considers the multiset selection problem with size constraints, which arises in many real-world applications such as budget allocation. Previous studies required the objective function f to be submodular, while we relax this assumption by introducing the notion of the submodularity ratios (denoted by α_f and β_f). We propose an anytime randomized iterative approach POMS, which maximizes the given objective f and minimizes the multiset size simultaneously. We prove that POMS using a reasonable time achieves an approximation guarantee of max{1-1/e^(β_f), (α_f/2)(1-1/e^(α_f))}. Particularly, when f is submdoular, this bound is at least as good as that of the previous greedy-style algorithms. In addition, we give lower bounds on the submodularity ratio for the objectives of budget allocation. Experimental results on budget allocation as well as a more complex application, namely, generalized influence maximization, exhibit the superior performance of the proposed approach.

artificial intelligence, budget allocation, optimization problem, (17 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China > Anhui Province (0.14)

Technology:

Information Technology > Communications (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Domain Generalization via Conditional Invariant Representations

Li, Ya (University of Science and Technology of China) | Gong, Mingming (Carnegie Mellon University; University of Pittsburgh) | Tian, Xinmei (University of Science and Technology of China) | Liu, Tongliang (University of Sydney) | Tao, Dacheng (University of Sydney)

AAAI ConferencesFeb-8-2018

Domain generalization aims to apply knowledge gained from multiple labeled source domains to unseen target domains. The main difficulty comes from the dataset bias: training data and test data have different distributions, and the training set contains heterogeneous samples from different distributions. Let X denote the features, and Y be the class labels. Existing domain generalization methods address the dataset bias problem by learning a domain-invariant representation h(X) that has the same marginal distribution P(h(X)) across multiple source domains. The functional relationship encoded in P(Y|X) is usually assumed to be stable across domains such that P(Y|h(X)) is also invariant. However, it is unclear whether this assumption holds in practical problems. In this paper, we consider the general situation where both P(X) and P(Y|X) can change across all domains. We propose to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y). With the conditional invariant representation, the invariance of the joint distribution P(h(X),Y) can be guaranteed if the class prior P(Y) does not change across training and test domains. Extensive experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.

artificial intelligence, domain generalization, evolutionary algorithm, (16 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Oceania > Australia (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hierarchical LSTM for Sign Language Translation

Guo, Dan (Hefei University of Technology) | Zhou, Wengang (University of Science and Technology of China) | Li, Houqiang (University of Science and Technology of China) | Wang, Meng (Hefei University of Technology)

AAAI ConferencesFeb-8-2018

Continuous Sign Language Translation (SLT) is a challenging task due to its specific linguistics under sequential gesture variation without word alignment. Current hybrid HMM and CTC (Connectionist temporal classification) based models are proposed to solve frame or word level alignment. They may fail to tackle the cases with messing word order corresponding to visual content in sentences. To solve the issue, this paper proposes a hierarchical-LSTM (HLSTM) encoder-decoder model with visual content and word embedding for SLT. It tackles different granularities by conveying spatio-temporal transitions among frames, clips and viseme units. It firstly explores spatio-temporal cues of video clips by 3D CNN and packs appropriate visemes by online key clip mining with adaptive variable-length. After pooling on recurrent outputs of the top layer of HLSTM, a temporal attention-aware weighting mechanism is proposed to balance the intrinsic relationship among viseme source positions. At last, another two LSTM layers are used to separately recurse viseme vectors and translate semantic. After preserving original visual content by 3D CNN and the top layer of HLSTM, it shortens the encoding time step of the bottom two LSTM layers with less computational complexity while attaining more nonlinearity. Our proposed model exhibits promising performance on singer-independent test with seen sentences and also outperforms the comparison algorithms on unseen sentences.

deep learning, hlstm, language learning, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Industry: Education > Curriculum > Subject-Specific Education (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback