AITopics | Carbonell, Jaime

Collaborating Authors

Carbonell, Jaime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Characterizing and Avoiding Negative Transfer

Wang, Zirui, Dai, Zihang, Póczos, Barnabás, Carbonell, Jaime

arXiv.org Machine LearningMar-31-2019

When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task. However, when transferring knowledge from a less related source, it may inversely hurt the target performance, a phenomenon known as negative transfer. Despite its pervasiveness, negative transfer is usually described in an informal manner, lacking rigorous definition, careful analysis, or systematic treatment. This paper proposes a formal definition of negative transfer and analyzes three important aspects thereof. Stemming from this analysis, a novel technique is proposed to circumvent negative transfer by filtering out unrelated source data. Based on adversarial networks, the technique is highly generic and can be applied to a wide range of transfer learning algorithms. The proposed approach is evaluated on six state-of-the-art deep transfer methods via experiments on four benchmark datasets with varying levels of difficulty. Empirically, the proposed method consistently improves the performance of all baseline methods and largely avoids negative transfer, even when the source data is degenerate.

artificial intelligence, negative transfer, neural network, (20 more...)

arXiv.org Machine Learning

1811.09751

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.57)

Add feedback

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Dai, Zihang, Yang, Zhilin, Yang, Yiming, Carbonell, Jaime, Le, Quoc V., Salakhutdinov, Ruslan

arXiv.org Machine LearningJan-18-2019

Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, Transformer-XL, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence. Concretely, it consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the problem of context fragmentation. As a result, Transformer-XL learns dependency that is about 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800 times faster than vanilla Transformer during evaluation. Language modeling is among the important problems that require modeling long-term dependency, with successful applications such as unsupervised pretraining (Dai & Le, 2015; Peters et al., 2018; Radford et al., 2018; Devlin et al., 2018). However, it has been a challenge to equip neural networks with the capability to model long-term dependency in sequential data. Recurrent neural networks (RNNs), in particular Long Short-Term Memory (LSTM) networks (Hochreiter & Schmidhuber, 1997), have been a standard solution to language modeling and obtained strong results on multiple benchmarks. Despite the wide adaption, RNNs are difficult to optimize due to gradient vanishing and explosion (Hochreiter et al., 2001), and the introduction of gating in LSTMs and the gradient clipping technique (Graves, 2013; Pascanu et al., 2012) might not be sufficient to fully address this issue. Empirically, previous work has found that LSTM language models use 200 context words on average (Khandelwal et al., 2018), indicating room for further improvement. On the other hand, the direct connections between long-distance word pairs baked in attention mechanisms might ease optimization and enable the learning of long-term dependency (Bahdanau et al., 2014; V aswani et al., 2017).

arxiv preprint arxiv, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1901.0286

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards more Reliable Transfer Learning

Wang, Zirui, Carbonell, Jaime

arXiv.org Machine LearningJul-5-2018

Multi-source transfer learning has been proven effective when within-target labeled data is scarce. Previous work focuses primarily on exploiting domain similarities and assumes that source domains are richly or at least comparably labeled. While this strong assumption is never true in practice, this paper relaxes it and addresses challenges related to sources with diverse labeling volume and diverse reliability. The first challenge is combining domain similarity and source reliability by proposing a new transfer learning method that utilizes both source-target similarities and inter-source relationships. The second challenge involves pool-based active learning where the oracle is only available in source domains, resulting in an integrated active transfer learning framework that incorporates distribution matching and uncertainty sampling. Extensive experiments on synthetic and two real-world datasets clearly demonstrate the superiority of our proposed methods over several baselines including state-of-the-art transfer learning methods.

artificial intelligence, machine learning, transfer learning, (18 more...)

arXiv.org Machine Learning

1807.02235

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Education (0.46)
Health & Medicine > Therapeutic Area (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

Block-Normalized Gradient Method: An Empirical Study for Training Deep Neural Network

Yu, Adams Wei, Huang, Lei, Lin, Qihang, Salakhutdinov, Ruslan, Carbonell, Jaime

arXiv.org Artificial IntelligenceApr-23-2018

In this paper, we propose a generic and simple strategy for utilizing stochastic gradient information in optimization. The technique essentially contains two consecutive steps in each iteration: 1) computing and normalizing each block (layer) of the mini-batch stochastic gradient; 2) selecting appropriate step size to update the decision variable (parameter) towards the negative of the block-normalized gradient. We conduct extensive empirical studies on various non-convex neural network optimization problems, including multi-layer perceptron, convolution neural networks and recurrent neural networks. The results indicate the block-normalized gradient can help accelerate the training of neural networks. In particular, we observe that the normalized gradient methods having constant step size with occasionally decay, such as SGD with momentum, have better performance in the deep convolution neural networks, while those with adaptive step sizes, such as Adam, perform better in recurrent neural networks. Besides, we also observe this line of methods can lead to solutions with better generalization properties, which is confirmed by the performance improvement over strong baselines.

deep learning, gradient, neural network, (19 more...)

arXiv.org Artificial Intelligence

1707.04822

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Active Learning from Peers

Murugesan, Keerthiram, Carbonell, Jaime

Neural Information Processing SystemsDec-31-2017

This paper addresses the challenge of learning from peers in an online multitask setting. Instead of always requesting a label from a human oracle, the proposed method first determines if the learner for each task can acquire that label with sufficient confidence from its peers either as a task-similarity weighted sum, or from the single most similar task. If so, it saves the oracle query for later use in more difficult cases, and if not it queries the human oracle. The paper develops the new algorithm to exhibit this behavior and proves a theoretical mistake bound for the method compared to the best linear predictor in hindsight. Experiments over three multitask learning benchmark datasets show clearly superior performance over baselines such as assuming task independence, learning only from the oracle and not learning from peer tasks.

artificial intelligence, learner, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Likelihood Almost Free Inference Networks

Zheng, Guoqing, Yang, Yiming, Carbonell, Jaime

arXiv.org Machine LearningNov-20-2017

Variational inference for latent variable models is prevalent in various machine learning problems, typically solved by maximizing the Evidence Lower Bound (ELBO) of the true data likelihood with respect to a variational distribution. However, freely enriching the family of variational distribution is challenging since the ELBO requires variational likelihood evaluations of the latent variables. In this paper, we propose a novel framework to enrich the variational family based on an alternative lower bound, by introducing auxiliary random variables to the variational distribution only. While offering a much richer family of complex variational distributions, the resulting inference network is likelihood almost free in the sense that only the latent variables require evaluations from simple likelihoods and samples from all the auxiliary variables are sufficient for maximum likelihood inference. We show that the proposed approach is essentially optimizing a probabilistic mixture of ELBOs, thus enriching modeling capacity and enhancing robustness. It outperforms state-of-the-art methods in our experiments on several density estimation tasks.

auxiliary variable, bayesian inference, neural network, (16 more...)

arXiv.org Machine Learning

1711.08352

Country: North America > United States > New York (0.16)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Self-Paced Multitask Learning with Shared Knowledge

Murugesan, Keerthiram, Carbonell, Jaime

arXiv.org Machine LearningJun-19-2017

This paper introduces self-paced task selection to multitask learning, where instances from more closely related tasks are selected in a progression of easier-to-harder tasks, to emulate an effective human education strategy, but applied to multitask machine learning. We develop the mathematical foundation for the approach based on iterative selection of the most appropriate task, learning the task parameters, and updating the shared knowledge, optimizing a new bi-convex loss function. This proposed method applies quite generally, including to multitask feature learning, multitask learning with alternating structure optimization, etc. Results show that in each of the above formulations self-paced (easier-to-harder) task selection outperforms the baseline version of these methods in all the experiments.

artificial intelligence, learning, machine learning, (13 more...)

arXiv.org Machine Learning

1703.00977

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.94)

Add feedback

Multi-Task Multiple Kernel Relationship Learning

Murugesan, Keerthiram, Carbonell, Jaime

arXiv.org Machine LearningMar-2-2017

This paper presents a novel multitask multiple kernel learning framework that efficiently learns the kernel weights leveraging the relationship across multiple tasks. The idea is to automatically infer this task relationship in the \textit{RKHS} space corresponding to the given base kernels. The problem is formulated as a regularization-based approach called \textit{Multi-Task Multiple Kernel Relationship Learning} (\textit{MK-MTRL}), which models the task relationship matrix from the weights learned from latent feature spaces of task-specific base kernels. Unlike in previous work, the proposed formulation allows one to incorporate prior knowledge for simultaneously learning several related tasks. We propose an alternating minimization algorithm to learn the model parameters, kernel weights and task relationship matrix. In order to tackle large-scale problems, we further propose a two-stage \textit{MK-MTRL} online learning algorithm and show that it significantly reduces the computational time, and also achieves performance comparable to that of the joint learning framework. Experimental results on benchmark datasets show that the proposed formulations outperform several state-of-the-art multitask learning methods.

artificial intelligence, kernel, machine learning, (13 more...)

arXiv.org Machine Learning

1611.03427

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Industry:

Education (0.89)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Co-Clustering for Multitask Learning

Murugesan, Keerthiram, Carbonell, Jaime, Yang, Yiming

arXiv.org Machine LearningMar-2-2017

This paper presents a new multitask learning framework that learns a shared representation among the tasks, incorporating both task and feature clusters. The jointly-induced clusters yield a shared latent subspace where task relationships are learned more effectively and more generally than in state-of-the-art multitask learning methods. The proposed general framework enables the derivation of more specific or restricted state-of-the-art multitask methods. The paper also proposes a highly-scalable multitask learning algorithm, based on the new framework, using conjugate gradient descent and generalized \textit{Sylvester equations}. Experimental results on synthetic and benchmark datasets show that the proposed method systematically outperforms several state-of-the-art multitask learning methods.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

1703.00994

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Vision-Language Fusion for Object Recognition

Shiang, Sz-Rung (Carnegie Mellon University) | Rosenthal, Stephanie (Carnegie Mellon University) | Gershman, Anatole (Carnegie Mellon University) | Carbonell, Jaime (Carnegie Mellon University) | Oh, Jean (Carnegie Mellon University)

AAAI ConferencesFeb-14-2017

While recent advances in computer vision have caused object recognition rates to spike, there is still much room for improvement. In this paper, we develop an algorithm to improve object recognition by integrating human-generated contextual information with vision algorithms. Specifically, we examine how interactive systems such as robots can utilize two types of context information--verbal descriptions of an environment and human-labeled datasets. We propose a re-ranking schema, MultiRank, for object recognition that can efficiently combine such information with the computer vision results. In our experiments, we achieve up to 9.4% and 16.6% accuracy improvements using the oracle and the detected bounding boxes, respectively, over the vision-only recognizers. We conclude that our algorithm has the ability to make a significant impact on object recognition in robotics and beyond.

neural network, relation, speech recognition, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
(2 more...)

Add feedback