AITopics | Xiong, Caiming

Plotting

Xiong, Caiming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improved Regularization Techniques for End-to-End Speech Recognition

Zhou, Yingbo, Xiong, Caiming, Socher, Richard

arXiv.org Machine LearningDec-19-2017

Regularization is important for end-to-end speech models, since the models are highly flexible and easy to overfit. Data augmentation and dropout has been important for improving end-to-end models in other domains. However, they are relatively under explored for end-to-end speech models. Therefore, we investigate the effectiveness of both methods for end-to-end trainable, deep speech recognition models. We augment audio data through random perturbations of tempo, pitch, volume, temporal alignment, and adding random noise.We further investigate the effect of dropout when applied to the inputs of all layers of the network. We show that the combination of data augmentation and dropout give a relative performance improvement on both Wall Street Journal (WSJ) and LibriSpeech dataset of over 20%. Our model performance is also competitive with other end-to-end speech models on both datasets.

augmentation, deep learning, speech recognition, (18 more...)

arXiv.org Machine Learning

1712.07108

Genre: Research Report (0.50)

Industry: Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Improving End-to-End Speech Recognition with Policy Learning

Zhou, Yingbo, Xiong, Caiming, Socher, Richard

arXiv.org Machine LearningDec-19-2017

Connectionist temporal classification (CTC) is widely used for maximum likelihood learning in end-to-end speech recognition models. However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e.g., word error rate (WER). This results in a mismatch between the objective function and metric during training. We show that the above problem can be mitigated by jointly training with maximum likelihood and policy gradient. In particular, with policy learning we are able to directly optimize on the (otherwise non-differentiable) performance metric. We show that joint training improves relative performance by 4% to 13% for our end-to-end model as compared to the same model learned through maximum likelihood. The model achieves 5.53% WER on Wall Street Journal dataset, and 5.42% and 14.70% on Librispeech test-clean and test-other set, respectively.

arxiv preprint arxiv, deep learning, speech recognition, (15 more...)

arXiv.org Machine Learning

1712.07101

Genre: Research Report (0.40)

Industry: Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Hashimoto, Kazuma, Xiong, Caiming, Tsuruoka, Yoshimasa, Socher, Richard

arXiv.org Artificial IntelligenceJul-24-2017

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task's loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

deep learning, neural network, proceedings, (19 more...)

arXiv.org Artificial Intelligence

1611.01587

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Dynamic Coattention Networks For Question Answering

Xiong, Caiming, Zhong, Victor, Socher, Richard

arXiv.org Artificial IntelligenceFeb-13-2017

Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.

conference paper, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1611.01604

Country:

Europe (0.95)
North America > United States (0.14)

Industry:

Law (0.95)
Government > Regional Government > Europe Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Maximum Margin Dirichlet Process Mixtures for Clustering

Chen, Gang (State University of New York at Buffalo) | Zhang, Haiying (Chinese Academy of Sciences) | Xiong, Caiming (Metamind Inc.)

AAAI ConferencesApr-19-2016

The Dirichlet process mixtures (DPM) can automatically infer the model complexity from data. Hence it has attracted significant attention recently, and is widely used for model selection and clustering. As a generative model, it generally requires prior base distribution to learn component parameters by maximizing posterior probability. In contrast, discriminative classifiers model the conditional probability directly, and have yielded better results than generative classifiers.In this paper, we propose a maximum margin Dirichlet process mixture for clustering, which is different from the traditional DPM for parameter modeling. Our model takes a discriminative clustering approach, by maximizing a conditional likelihood to estimate parameters. In particular, we take a EM-like algorithm by leveraging Gibbs sampling algorithm for inference, which in turn can be perfectly embedded in the online maximum margin learning procedure to update model parameters. We test our model and show comparative results over the traditional DPM and other nonparametric clustering approaches.

Add feedback

A Unified Framework for Human-Robot Knowledge Transfer

Shukla, Nishant (University of California, Los Angeles) | Xiong, Caiming (University of California, Los Angeles) | Zhu, Song-Chun (University of California, Los Angeles)

AAAI ConferencesNov-1-2015

Transferring knowledge is a vital skill between humans for efficiently learning a new concept. In a perfect system, a human demonstrator can teach a robot a new task by using natural language and physical gestures. The robot would gradually accumulate and refine its spatial, temporal, and causal understanding of the world. The knowledge can then be transferred back to another human, or further to another robot. The implications of effective human to robot knowledge transfer include the compelling opportunity of a robot acting as the teacher, guiding humans in new tasks. The technical difficulty in achieving a robot implementation Figure 1: The robot autonomously performs a cloth folding of this caliber involves both an expressive knowledge task after learning from a human demonstration.

artificial intelligence, knowledge transfer, spatial reasoning, (14 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.43)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.36)

Add feedback

Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework

Xu, Ran (State University of New York at Buffalo) | Xiong, Caiming ( University of California, Los Angeles ) | Chen, Wei (State University of New York at Buffalo) | Corso, Jason J (University of Michagan)

AAAI ConferencesMar-6-2015

Recently, joint video-language modeling has been attracting more and more attention. However, most existing approaches focus on exploring the language model upon on a fixed visual model. In this paper, we propose a unified framework that jointly models video and the corresponding text sentences. The framework consists of three parts: a compositional semantics language model, a deep video model and a joint embedding model. In our language model, we propose a dependency-tree structure model that embeds sentence into a continuous vector space, which preserves visually grounded meanings and word order. In the visual model, we leverage deep neural networks to capture essential semantic information from videos. In the joint embedding model, we minimize the distance of the outputs of the deep video model and compositional language model in the joint space, and update these two models jointly. Based on these three parts, our system is able to accomplish three tasks: 1) natural language generation, and 2) video retrieval and 3) language retrieval. In the experiments, the results show our approach outperforms SVM, CRF and CCA baselines in predicting Subject-Verb- Object triplet and natural sentence generation, and is better than CCA in video retrieval and language retrieval tasks.

deep learning, neural network, video, (20 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Latent Domains Modeling for Visual Domain Adaptation

Xiong, Caiming (State University of New York at Buffalo) | McCloskey, Scott (Honeywell ACS) | Hsieh, Shao-Hang (State University of New York at Buffalo) | Corso, Jason J. (State University of New York at Buffalo)

AAAI ConferencesJul-14-2014

To improve robustness to significant mismatches between source domain and target domain - arising from changes such as illumination, pose and image quality - domain adaptation is increasingly popular in computer vision. But most of methods assume that the source data is from single domain, or that multi-domain datasets provide the domain label for training instances. In practice, most datasets are mixtures of multiple latent domains, and difficult to manually provide the domain label of each data point. In this paper, we propose a model that automatically discovers latent domains in visual datasets. We first assume the visual images are sampled from multiple manifolds, each of which represents different domain, and which are represented by different subspaces. Using the neighborhood structure estimated from images belonging to the same category, we approximate the local linear invariant subspace for each image based on its local structure, eliminating the category-specific elements of the feature. Based on the effectiveness of this representation, we then propose a squared-loss mutual information based clustering model with category distribution prior in each domain to infer the domain assignment for images. In experiment, we test our approach on two common image datasets, the results show that our method outperforms the existing state-of-the-art methods, and also show the superiority of multiple latent domain discovery.

adaptation, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies

Johnson, David M., Xiong, Caiming, Corso, Jason J.

arXiv.org Machine LearningFeb-22-2014

Metric learning is a key problem for many data mining and machine learning applications, and has long been dominated by Mahalanobis methods. Recent advances in nonlinear metric learning have demonstrated the potential power of non-Mahalanobis distance functions, particularly tree-based functions. We propose a novel nonlinear metric learning method that uses an iterative, hierarchical variant of semi-supervised max-margin clustering to construct a forest of cluster hierarchies, where each individual hierarchy can be interpreted as a weak metric over the data. By introducing randomness during hierarchy training and combining the output of many of the resulting semi-random weak hierarchy metrics, we can obtain a powerful and robust nonlinear metric model. This method has two primary contributions: first, it is semi-supervised, incorporating information from both constrained and unconstrained points. Second, we take a relaxed approach to constraint satisfaction, allowing the method to satisfy different subsets of the constraints at different levels of the hierarchy rather than attempting to simultaneously satisfy all of them. This leads to a more robust learning algorithm. We compare our method to a number of state-of-the-art benchmarks on $k$-nearest neighbor classification, large-scale image retrieval and semi-supervised clustering problems, and find that our algorithm yields results comparable or superior to the state-of-the-art, and is significantly more robust to noise.

artificial intelligence, constraint, constraint-based reasoning, (19 more...)

arXiv.org Machine Learning

1402.5565

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Random Forests for Metric Learning with Implicit Pairwise Position Dependence

Xiong, Caiming, Johnson, David, Xu, Ran, Corso, Jason J.

arXiv.org Machine LearningJan-3-2012

Metric learning makes it plausible to learn distances for complex distributions of data from labeled data. However, to date, most metric learning methods are based on a single Mahalanobis metric, which cannot handle heterogeneous data well. Those that learn multiple metrics throughout the space have demonstrated superior accuracy, but at the cost of computational efficiency. Here, we take a new angle to the metric learning problem and learn a single metric that is able to implicitly adapt its distance function throughout the feature space. This metric adaptation is accomplished by using a random forest-based classifier to underpin the distance function and incorporate both absolute pairwise position and standard relative position into the representation. We have implemented and tested our method against state of the art global and multi-metric methods on a variety of data sets. Overall, the proposed method outperforms both types of methods in terms of accuracy (consistently ranked first) and is an order of magnitude faster than state of the art multi-metric methods (16x faster in the worst case).

decision tree learning, health & medicine, metric, (17 more...)

arXiv.org Machine Learning

1201.061

Country: Asia > Middle East (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)

Add feedback