AITopics | Xie, Pengtao

Plotting

Xie, Pengtao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diversity-Promoting Bayesian Learning of Latent Variable Models

Xie, Pengtao, Zhu, Jun, Xing, Eric P.

arXiv.org Machine LearningNov-23-2017

To address three important issues involved in latent variable models (LVMs), including capturing infrequent patterns, achieving small-sized but expressive models and alleviating overfitting, several studies have been devoted to "diversifying" LVMs, which aim at encouraging the components in LVMs to be diverse. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to "diversify" LVMs in the paradigm of Bayesian learning. We propose two approaches that have complementary advantages. One is to define a diversity-promoting mutual angular prior which assigns larger density to components with larger mutual angles and use this prior to affect the posterior via Bayes' rule. We develop two efficient approximate posterior inference algorithms based on variational inference and MCMC sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. We also extend our approach to "diversify" Bayesian nonparametric models where the number of components is infinite. A sampling algorithm based on slice sampling and Hamiltonian Monte Carlo is developed. We apply these methods to "diversify" Bayesian mixture of experts model and infinite latent feature model. Experiments on various datasets demonstrate the effectiveness and efficiency of our methods.

artificial intelligence, bayesian inference, diversity-promoting bayesian learning, (15 more...)

arXiv.org Machine Learning

1711.0877

Country:

Asia > Middle East > Iraq (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Medical Diagnosis From Laboratory Tests by Combining Generative and Discriminative Learning

Zhang, Shiyue, Xie, Pengtao, Wang, Dong, Xing, Eric P.

arXiv.org Machine LearningNov-16-2017

A primary goal of computational phenotype research is to conduct medical diagnosis. In hospital, physicians rely on massive clinical data to make diagnosis decisions, among which laboratory tests are one of the most important resources. However, the longitudinal and incomplete nature of laboratory test data casts a significant challenge on its interpretation and usage, which may result in harmful decisions by both human physicians and automatic diagnosis systems. In this work, we take advantage of deep generative models to deal with the complex laboratory tests. Specifically, we propose an end-to-end architecture that involves a deep generative variational recurrent neural networks (VRNN) to learn robust and generalizable features, and a discriminative neural network (NN) model to learn diagnosis decision making, and the two models are trained jointly. Our experiments are conducted on a dataset involving 46,252 patients, and the 50 most frequent tests are used to predict the 50 most common diagnoses. The results show that our model, VRNN+NN, significantly (p<0.001) outperforms other baseline models. Moreover, we demonstrate that the representations learned by the joint training are more informative than those learned by pure generative models. Finally, we find that our model offers a surprisingly good imputation for missing values.

deep learning, generative model, neural network, (19 more...)

arXiv.org Machine Learning

1711.04329

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.89)

Industry:

Health & Medicine > Diagnostic Medicine > Lab Test (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

Zhang, Hao, Zheng, Zeyu, Xu, Shizhen, Dai, Wei, Ho, Qirong, Liang, Xiaodan, Hu, Zhiting, Wei, Jinliang, Xie, Pengtao, Xing, Eric P.

arXiv.org Machine LearningJun-10-2017

Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network synchronization. We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Poseidon exploits the layered model structures in DL programs to overlap communication and computation, reducing bursty network communication. Moreover, Poseidon uses a hybrid communication scheme that optimizes the number of bytes required to synchronize each layer, according to layer properties and the number of machines. We show that Poseidon is applicable to different DL frameworks by plugging Poseidon into Caffe and TensorFlow. We show that Poseidon enables Caffe and TensorFlow to achieve 15.5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification. Moreover, Poseidon-enabled TensorFlow achieves 31.5x speed-up with 32 single-GPU machines on Inception-V3, a 50% improvement over the open-source TensorFlow (20x speed-up).

deep learning, neural network, poseidon, (20 more...)

arXiv.org Machine Learning

1706.03292

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Strategies and Principles of Distributed Machine Learning on Big Data

Xing, Eric P., Ho, Qirong, Xie, Pengtao, Dai, Wei

arXiv.org Machine LearningDec-31-2015

The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required --- and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can benefit greatly from ML-rooted statistical and algorithmic insights --- and that ML researchers should therefore not shy away from such systems design --- we discuss a series of principles and strategies distilled from our recent efforts on industrial-scale ML solutions. These principles and strategies span a continuum from application, to engineering, and to theoretical research and development of Big ML systems and architectures, with the goal of understanding how to make them efficient, generally-applicable, and supported with convergence and scaling guarantees. They concern four key questions which traditionally receive little attention in ML research: How to distribute an ML program over a cluster? How to bridge ML computation with inter-machine communication? How to perform such communication? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typically seen in traditional computer programs, and by dissecting successful cases to reveal how we have harnessed these principles to design and develop both high-performance distributed ML software as well as general-purpose ML frameworks, we present opportunities for ML researchers and practitioners to further shape and grow the area that lies between ML and systems.

deep learning, ml program, neural network, (23 more...)

arXiv.org Machine Learning

1512.09295

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

Latent Variable Modeling with Diversity-Inducing Mutual Angular Regularization

Xie, Pengtao, Deng, Yuntian, Xing, Eric

arXiv.org Machine LearningDec-22-2015

Latent Variable Models (LVMs) are a large family of machine learning models providing a principled and effective way to extract underlying patterns, structure and knowledge from observed data. Due to the dramatic growth of volume and complexity of data, several new challenges have emerged and cannot be effectively addressed by existing LVMs: (1) How to capture long-tail patterns that carry crucial information when the popularity of patterns is distributed in a power-law fashion? (2) How to reduce model complexity and computational cost without compromising the modeling power of LVMs? (3) How to improve the interpretability and reduce the redundancy of discovered patterns? To addresses the three challenges discussed above, we develop a novel regularization technique for LVMs, which controls the geometry of the latent space during learning to enable the learned latent components of LVMs to be diverse in the sense that they are favored to be mutually different from each other, to accomplish long-tail coverage, low redundancy, and better interpretability. We propose a mutual angular regularizer (MAR) to encourage the components in LVMs to have larger mutual angles. The MAR is non-convex and non-smooth, entailing great challenges for optimization. To cope with this issue, we derive a smooth lower bound of the MAR and optimize the lower bound instead. We show that the monotonicity of the lower bound is closely aligned with the MAR to qualify the lower bound as a desirable surrogate of the MAR. Using neural network (NN) as an instance, we analyze how the MAR affects the generalization performance of NN. On two popular latent variable models --- restricted Boltzmann machine and distance metric learning, we demonstrate that MAR can effectively capture long-tail patterns, reduce model complexity without sacrificing expressivity and improve interpretability.

deep learning, latent variable modeling, neural network, (15 more...)

arXiv.org Machine Learning

1512.07336

Country:

Asia > Middle East > Iraq (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Japan > Honshū (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting

Kumar, Abhimanu, Xie, Pengtao, Yin, Junming, Xing, Eric P.

arXiv.org Machine LearningDec-11-2015

We propose a distributed approach to train deep neural networks (DNNs), which has guaranteed convergence theoretically and great scalability empirically: close to 6 times faster on instance of ImageNet data set when run with 6 machines. The proposed scheme is close to optimally scalable in terms of number of machines, and guaranteed to converge to the same optima as the undistributed setting. The convergence and scalability of the distributed setting is shown empirically across different datasets (TIMIT and ImageNet) and machine learning tasks (image classification and phoneme extraction). The convergence analysis provides novel insights into this complex learning scheme, including: 1) layerwise convergence, and 2) convergence of the weights in probability.

deep neural network, ssp, theoretical analysis

arXiv.org Machine Learning

1512.02728

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Petuum: A New Platform for Distributed Machine Learning on Big Data

Xing, Eric P., Ho, Qirong, Dai, Wei, Kim, Jin Kyu, Wei, Jinliang, Lee, Seunghak, Zheng, Xun, Xie, Pengtao, Kumar, Abhimanu, Yu, Yaoliang

arXiv.org Machine LearningMay-14-2015

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.

algorithm, deep learning, neural network, (24 more...)

arXiv.org Machine Learning

1312.7651

Country: Asia (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Mining User Interests from Personal Photos

Xie, Pengtao (Carnegie Mellon University) | Pei, Yulong (Carnegie Mellon University) | Xie, Yuan (Carnegie Mellon University) | Xing, Eric (Carnegie Mellon University)

AAAI ConferencesMar-6-2015

Personal photos are enjoying explosive growth with the popularity of photo-taking devices and social media. The vast amount of online photos largely exhibit users' interests, emotion and opinions. Mining user interests from personal photos can boost a number of utilities, such as advertising, interest based community detection and photo recommendation. In this paper, we study the problem of user interests mining from personal photos. We propose a User Image Latent Space Model to jointly model user interests and image contents. User interests are modeled as latent factors and each user is assumed to have a distribution over them. By inferring the latent factors and users' distributions, we can discover what the users are interested in. We model image contents with a four-level hierarchical structure where the layers correspond to themes, semantic regions, visual words and pixels respectively. Users' latent interests are embedded in the theme layer. Given image contents, users' interests can be discovered by doing posterior inference. We use variational inference to approximate the posteriors of latent variables and learn model parameters. Experiments on 180K Flickr photos demonstrate the effectiveness of our model.

artificial intelligence, social media, user interest, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Information Technology > Services (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.48)

Add feedback

Integrating Image Clustering and Codebook Learning

Xie, Pengtao (Carnegie Mellon University) | Xing, Eric P. (Carnegie Mellon University)

AAAI ConferencesMar-6-2015

Image clustering and visual codebook learning are two fundamental problems in computer vision and they are tightly related. On one hand, a good codebook can generate effective feature representations which largely affect clustering performance. On the other hand, class labels obtained from image clustering can serve as supervised information to guide codebook learning. Traditionally, these two processes are conducted separately and their correlation is generally ignored.In this paper, we propose a Double Layer Gaussian Mixture Model (DLGMM) to simultaneously perform image clustering and codebook learning. In DLGMM, two tasks are seamlessly coupled and can mutually promote each other. Cluster labels and codebook are jointly estimated to achieve the overall best performance. To incorporate the spatial coherence between neighboring visual patches, we propose a Spatially Coherent DLGMM which uses a Markov Random Field to encourage neighboring patches to share the same visual word label.We use variational inference to approximate the posterior of latent variables and learn model parameters.Experiments on two datasets demonstrate the effectiveness of two models.

Add feedback

Cauchy Principal Component Analysis

Xie, Pengtao, Xing, Eric

arXiv.org Machine LearningDec-19-2014

Principal Component Analysis (PCA) has wide applications in machine learning, text mining and computer vision. Classical PCA based on a Gaussian noise model is fragile to noise of large magnitude. Laplace noise assumption based PCA methods cannot deal with dense noise effectively. In this paper, we propose Cauchy Principal Component Analysis (Cauchy PCA), a very simple yet effective PCA method which is robust to various types of noise. We utilize Cauchy distribution to model noise and derive Cauchy PCA under the maximum likelihood estimation (MLE) framework with low rank constraint. Our method can robustly estimate the low rank matrix regardless of whether noise is large or small, dense or sparse. We analyze the robustness of Cauchy PCA from a robust statistics view and present an efficient singular value projection optimization method. Experimental results on both simulated data and real applications demonstrate the robustness of Cauchy PCA to various noise patterns.

artificial intelligence, machine learning, pca, (18 more...)

arXiv.org Machine Learning

1412.6506

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Add feedback