AITopics | Xing, Eric

Collaborating Authors

Xing, Eric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text Infilling

Zhu, Wanrong, Hu, Zhiting, Xing, Eric

arXiv.org Machine LearningJan-18-2019

Recent years have seen remarkable progress of text generation in different contexts, such as the most common setting of generating text from scratch, and the emerging paradigm of retrieval-and-rewriting. Text infilling, which fills missing text portions of a sentence or paragraph, is also of numerous use in real life, yet is under-explored. Previous work has focused on restricted settings by either assuming single word per missing portion or limiting to a single missing portion to the end of the text. This paper studies the general task of text infilling, where the input text can have an arbitrary number of portions to be filled, each of which may require an arbitrary unknown number of tokens. We study various approaches for the task, including a self-attention model with segment-aware position encoding and bidirectional context modeling. We create extensive supervised data by masking out text with varying strategies. Experiments show the self-attention model greatly outperforms others, creating a strong baseline for future research.

artificial intelligence, neural network, template, (16 more...)

arXiv.org Machine Learning

1901.00158

Country: North America > Canada (0.29)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Connecting the Dots Between MLE and RL for Sequence Generation

Tan, Bowen, Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Xing, Eric

arXiv.org Artificial IntelligenceNov-23-2018

Sequence generation models such as recurrent networks can be trained with a diverse set of learning algorithms. For example, maximum likelihood learning is simple and efficient, yet suffers from the exposure bias problem. Reinforcement learning like policy gradient addresses the problem but can have prohibitively poor exploration efficiency. A variety of other algorithms such as RAML, SPG, and data noising, have also been developed from different perspectives. This paper establishes a formal connection between these algorithms. We present a generalized entropy regularized policy optimization formulation, and show that the apparently divergent algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of reward function and a couple of hyperparameters. The unified interpretation offers a systematic view of the varying properties of exploration and learning efficiency. Besides, based on the framework, we present a new algorithm that dynamically interpolates among the existing algorithms for improved learning. Experiments on machine translation and text summarization demonstrate the superiority of the proposed algorithm.

algorithm, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1811.0974

Country:

Asia (0.28)
North America > United States > Maryland (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

On the Complexity of Exploration in Goal-Driven Navigation

Al-Shedivat, Maruan, Lee, Lisa, Salakhutdinov, Ruslan, Xing, Eric

arXiv.org Artificial IntelligenceNov-16-2018

Building agents that can explore their environments intelligently is a challenging open problem. In this paper, we make a step towards understanding how a hierarchical design of the agent's policy can affect its exploration capabilities. First, we design EscapeRoom environments, where the agent must figure out how to navigate to the exit by accomplishing a number of intermediate tasks (\emph{subgoals}), such as finding keys or opening doors. Our environments are procedurally generated and vary in complexity, which can be controlled by the number of subgoals and relationships between them. Next, we propose to measure the complexity of each environment by constructing dependency graphs between the goals and analytically computing \emph{hitting times} of a random walk in the graph. We empirically evaluate Proximal Policy Optimization (PPO) with sparse and shaped rewards, a variation of policy sketches, and a hierarchical version of PPO (called HiPPO) akin to h-DQN. We show that analytically estimated \emph{hitting time} in goal dependency graphs is an informative metric of the environment complexity. We conjecture that the result should hold for environments other than navigation. Finally, we show that solving environments beyond certain level of complexity requires hierarchical approaches.

artificial intelligence, complexity, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1811.06889

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Multimodal Machine Learning for Automated ICD Coding

Xu, Keyang, Lam, Mike, Pang, Jingzhi, Gao, Xin, Band, Charlotte, MD, Piyush Mathur, MD, Frank Papay, MD, Ashish K. Khanna, MD, Jacek B. Cywinski, MD, Kamal Maheshwari, Xie, Pengtao, Xing, Eric

arXiv.org Machine LearningOct-31-2018

This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.

nephrology, prediction, vascular disease, (31 more...)

arXiv.org Machine Learning

1810.13348

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Ohio (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)

Add feedback

Missing Value Imputation Based on Deep Generative Models

Zhang, Hongbao, Xie, Pengtao, Xing, Eric

arXiv.org Machine LearningAug-5-2018

Missing values widely exist in many real-world datasets, which hinders the performing of advanced data analytics. Properly filling these missing values is crucial but challenging, especially when the missing rate is high. Many approaches have been proposed for missing value imputation (MVI), but they are mostly heuristics-based, lacking a principled foundation and do not perform satisfactorily in practice. In this paper, we propose a probabilistic framework based on deep generative models for MVI. Under this framework, imputing the missing entries amounts to seeking a fixed-point solution between two conditional distributions defined on the missing entries and latent variables respectively. These distributions are parameterized by deep neural networks (DNNs) which possess high approximation power and can capture the nonlinear relationships between missing entries and the observed values. The learning of weight parameters of DNNs is performed by maximizing an approximation of the log-likelihood of observed values. We conducted extensive evaluation on 13 datasets and compared with 11 baselines methods, where our methods largely outperforms the baselines.

dataset, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1808.01684

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Geometric Generalization Based Zero-Shot Learning Dataset Infinite World: Simple Yet Powerful

Chidambaram, Rajesh, Kampffmeyer, Michael, Neiswanger, Willie, Liang, Xiaodan, Lachmann, Thomas, Xing, Eric

arXiv.org Machine LearningJul-11-2018

Raven's Progressive Matrices are one of the widely used tests in evaluating the human test taker's fluid intelligence. Analogously, this paper introduces geometric generalization based zero-shot learning tests to measure the rapid learning ability and the internal consistency of deep generative models. Our empirical research analysis on state-of-the-art generative models discern their ability to generalize concepts across classes. In the process, we introduce Infinite World, an evaluable, scalable, multi-modal, light-weight dataset and Zero-Shot Intelligence Metric ZSI. The proposed tests condenses human-level spatial and numerical reasoning tasks to its simplistic geometric forms. The dataset is scalable to a theoretical limit of infinity, in numerical features of the generated geometric figures, image size and in quantity. We systematically analyze state-of-the-art model's internal consistency, identify their bottlenecks and propose a pro-active optimization method for few-shot and zero-shot learning.

dataset, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1807.03711

Country: Europe (1.00)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Deep Generative Models with Learnable Knowledge Constraints

Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Liang, Xiaodan, Qin, Lianhui, Dong, Haoye, Xing, Eric

arXiv.org Machine LearningJun-25-2018

The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.

constraint, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1806.09764

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

Add feedback

Gated Path Planning Networks

Lee, Lisa, Parisotto, Emilio, Chaplot, Devendra Singh, Xing, Eric, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceJun-17-2018

Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining end-to-end differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems. In this work, we reframe VINs as recurrent-convolutional networks which demonstrates that VINs couple recurrent convolutions with an unconventional max-pooling activation. From this perspective, we argue that standard gated recurrent update equations could potentially alleviate the optimization issues plaguing VIN. The resulting architecture, which we call the Gated Path Planning Network, is shown to empirically outperform VIN on a variety of metrics such as learning speed, hyperparameter sensitivity, iteration count, and even generalization. Furthermore, we show that this performance gap is consistent across different maze transition types, maze sizes and even show success on a challenging 3D environment, where the planner is only provided with first-person RGB images.

computer game, deep learning, vin, (21 more...)

arXiv.org Artificial Intelligence

1806.06408

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.92)

Add feedback

Deep learning based supervised semantic segmentation of Electron Cryo-Subtomograms

Liu, Chang, Zeng, Xiangrui, Lin, Ruogu, Liang, Xiaodan, Freyberg, Zachary, Xing, Eric, Xu, Min

arXiv.org Machine LearningFeb-12-2018

Cellular Electron Cryo-Tomography (CECT) is a powerful imaging technique for the 3D visualization of cellular structure and organization at submolecular resolution. It enables analyzing the native structures of macromolecular complexes and their spatial organization inside single cells. However, due to the high degree of structural complexity and practical imaging limitations, systematic macromolecular structural recovery inside CECT images remains challenging. Particularly, the recovery of a macromolecule is likely to be biased by its neighbor structures due to the high molecular crowding. To reduce the bias, here we introduce a novel 3D convolutional neural network inspired by Fully Convolutional Network and Encoder-Decoder Architecture for the supervised segmentation of macromolecules of interest in subtomograms. The tests of our models on realistically simulated CECT data demonstrate that our new approach has significantly improved segmentation performance compared to our baseline approach. Also, we demonstrate that the proposed model has generalization ability to segment new structures that do not exist in training data.

deep learning, neural network, segmentation, (21 more...)

arXiv.org Machine Learning

1802.04087

Country: North America > United States (0.29)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Kandasamy, Kirthevasan, Neiswanger, Willie, Schneider, Jeff, Poczos, Barnabas, Xing, Eric

arXiv.org Machine LearningFeb-11-2018

Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permits tuning scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network \emph{architectures}. In this work, we develop NASBOT, a Gaussian process based BO framework for neural architecture search. To accomplish this, we develop a distance metric in the space of neural network architectures which can be computed efficiently via an optimal transport program. This distance might be of independent interest to the deep learning community as it may find applications outside of BO. We demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks.

deep learning, neural network, tanh, (15 more...)

arXiv.org Machine Learning

1802.07191

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback