AITopics | Caruana, Rich

Collaborating Authors

Caruana, Rich

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Forward Architecture Search

Hu, Hanzhang, Langford, John, Caruana, Rich, Mukherjee, Saurajit, Horvitz, Eric, Dey, Debadeepta

arXiv.org Machine LearningMay-30-2019

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers. The added shortcut connections effectively perform gradient boosting on the augmented layers. The proposed algorithm is motivated by the feature selection algorithm forward stage-wise linear regression, since we consider NAS as a generalization of feature selection for regression, where NAS selects shortcuts among layers instead of selecting features. In order to reduce the number of trials of possible connection combinations, we train jointly all possible connections at each stage of growth while leveraging feature selection techniques to choose a subset of them. We experimentally show this process to be an efficient forward architecture search algorithm that can find competitive models using few GPU days in both the search space of repeatable network modules (cell-search) and the space of general networks (macro-search). Petridish is particularly well-suited for warm-starting from existing models crucial for lifelong-learning scenarios.

deep learning, neural network, weak learner, (16 more...)

arXiv.org Machine Learning

1905.1336

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Interpretability is Harder in the Multiclass Setting: Axiomatic Interpretability for Multiclass Additive Models

Zhang, Xuezhou, Tan, Sarah, Koch, Paul, Lou, Yin, Chajewska, Urszula, Caruana, Rich

arXiv.org Machine LearningOct-22-2018

Generalized additive models (GAMs) are favored in many regression and binary classification problems because they are able to fit complex, nonlinear functions while still remaining interpretable. In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM fitting algorithms and sometimes matches the performance of full complex models. In the second part, we turn our attention to the interpretability of GAMs in the multiclass setting. Surprisingly, the natural interpretability of GAMs breaks down when there are more than two classes. Drawing inspiration from binary GAMs, we identify two axioms that any additive model must satisfy to not be visually misleading. We then develop a post-processing technique (API) that provably transforms pretrained additive models to satisfy the interpretability axioms without sacrificing accuracy. The technique works not just on models trained with our algorithm, but on any multiclass additive model. We demonstrate API on a 12-class infant-mortality dataset.

artificial intelligence, health & medicine, interpretability, (18 more...)

arXiv.org Machine Learning

1810.09092

Genre: Research Report (0.83)

Industry:

Health & Medicine > Public Health (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.47)

Add feedback

Sparse Partially Linear Additive Models

Lou, Yin, Bien, Jacob, Caruana, Rich, Gehrke, Johannes

arXiv.org Machine LearningMar-27-2018

The generalized partially linear additive model (GPLAM) is a flexible and interpretable approach to building predictive models. It combines features in an additive manner, allowing each to have either a linear or nonlinear effect on the response. However, the choice of which features to treat as linear or nonlinear is typically assumed known. Thus, to make a GPLAM a viable approach in situations in which little is known $a~priori$ about the features, one must overcome two primary model selection challenges: deciding which features to include in the model and determining which of these features to treat nonlinearly. We introduce the sparse partially linear additive model (SPLAM), which combines model fitting and $both$ of these model selection challenges into a single convex optimization problem. SPLAM provides a bridge between the lasso and sparse additive models. Through a statistical oracle inequality and thorough simulation, we demonstrate that SPLAM can outperform other methods across a broad spectrum of statistical regimes, including the high-dimensional ($p\gg N$) setting. We develop efficient algorithms that are applied to real data sets with half a million samples and over 45,000 features with excellent predictive performance.

artificial intelligence, machine learning, splam, (17 more...)

arXiv.org Machine Learning

1407.4729

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Auditing Black-Box Models Using Transparent Model Distillation With Side Information

Tan, Sarah, Caruana, Rich, Hooker, Giles, Lou, Yin

arXiv.org Machine LearningFeb-24-2018

Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose a transparent model distillation approach to audit such models. Model distillation was first introduced to transfer knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. To this we add a third criterion - transparency. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by the teacher. Moreover, we use side information in the form of the actual outcomes the teacher scoring model was intended to predict in the first place. By training a second transparent model on the outcomes, we can compare the two models to each other. When comparing models trained on risk scores to models trained on outcomes, we show that it is necessary to calibrate the risk-scoring model's predictions to remove distortion that may have been added to the black-box risk-scoring model during or after its training process. We also show how to compute confidence intervals for the particular class of transparent student models we use - tree-based additive models with pairwise interactions (GA2Ms) - to support comparison of the two transparent models. We demonstrate the methods on four public datasets: COMPAS, Lending Club, Stop-and-Frisk, and Chicago Police.

air transportation, law enforcement, risk score, (16 more...)

arXiv.org Machine Learning

1710.06169

Country:

North America > United States > Illinois > Cook County > Chicago (0.26)
North America > United States > Florida (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.47)

Industry:

Transportation > Air (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Education (1.00)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Transparent Model Distillation

Tan, Sarah, Caruana, Rich, Hooker, Giles, Gordo, Albert

arXiv.org Machine LearningJan-25-2018

Model distillation was originally designed to distill knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. We investigate model distillation for another goal -- transparency -- investigating if fully-connected neural networks can be distilled into models that are transparent or interpretable in some sense. Our teacher models are multilayer perceptrons, and we try two types of student models: (1) tree-based generalized additive models (GA2Ms), a type of boosted, short tree (2) gradient boosted trees (GBTs). More transparent student models are forthcoming. Our results are not yet conclusive. GA2Ms show some promise for distilling binary classification teachers, but not yet regression. GBTs are not "directly" interpretable but may be promising for regression teachers. GA2M models may provide a computationally viable alternative to additive decomposition methods for global function approximation.

health & medicine, neural network, student model, (17 more...)

arXiv.org Machine Learning

1801.0864

Genre: Research Report (0.70)

Industry:

Education (0.83)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)

Add feedback

Proceedings of NIPS 2017 Symposium on Interpretable Machine Learning

Wilson, Andrew Gordon, Yosinski, Jason, Simard, Patrice, Caruana, Rich, Herlands, William

arXiv.org Machine LearningDec-11-2017

This is the Proceedings of NIPS 2017 Symposium on Interpretable Machine Learning, held in Long Beach, California, USA on December 7, 2017

interpretable machine learning, proceedings, symposium, (1 more...)

arXiv.org Machine Learning

1711.09889

Country: North America > United States > California > Los Angeles County > Long Beach (0.24)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.60)

Add feedback

Do Deep Convolutional Nets Really Need to be Deep and Convolutional?

Urban, Gregor, Geras, Krzysztof J., Kahou, Samira Ebrahimi, Aslan, Ozlem, Wang, Shengjie, Caruana, Rich, Mohamed, Abdelrahman, Philipose, Matthai, Richardson, Matt

arXiv.org Machine LearningMar-3-2017

Yes, they do. This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small or shallow models of high accuracy to be trained. Although previous research showed that shallow feed-forward nets sometimes can learn the complex functions previously learned by deep nets while using the same number of parameters as the deep models they mimic, in this paper we demonstrate that the same methods cannot be used to train accurate models on CIFAR-10 unless the student models contain multiple layers of convolution. Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional teacher.

deep learning, neural network, student model, (17 more...)

arXiv.org Machine Learning

1603.05691

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

Lakkaraju, Himabindu (Stanford University) | Kamar, Ece (Microsoft Research) | Caruana, Rich (Microsoft Research) | Horvitz, Eric (Microsoft Research)

AAAI ConferencesFeb-14-2017

Predictive models deployed in the real world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases encountered at test time. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formulate and address the problem of informed discovery of unknown unknowns of any given predictive model where unknown unknowns occur due to systematic biases in the training data.We propose a model-agnostic methodology which uses feedback from an oracle to both identify unknown unknowns and to intelligently guide the discovery. We employ a two-phase approach which first organizes the data into multiple partitions based on the feature similarity of instances and the confidence scores assigned by the predictive model, and then utilizes an explore-exploit strategy for discovering unknown unknowns across these partitions. We demonstrate the efficacy of our framework by varying the underlying causes of unknown unknowns across various applications. To the best of our knowledge, this paper presents the first algorithmic approach to the problem of discovering unknown unknowns of predictive models.

artificial intelligence, modeling & simulation, unknown unknowns, (21 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Wisconsin (0.14)

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)

Add feedback

Do Deep Nets Really Need to be Deep?

Ba, Jimmy, Caruana, Rich

Neural Information Processing SystemsDec-31-2014

Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this paper we empirically demonstrate that shallow feed-forward nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow nets can learn these deep functions using the same number of parameters as the original deep models. On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional models.

accuracy, deep learning, neural network, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Active Learning with Model Selection

Ali, Alnur (Carnegie Mellon University) | Caruana, Rich (Microsoft Research) | Kapoor, Ashish (Microsoft Research)

AAAI ConferencesJul-14-2014

Most active learning methods avoid model selection by training models of one type (SVMs, boosted trees, etc.) using one pre-defined set of model hyperparameters. We propose an algorithm that actively samples data to simultaneously train a set of candidate models (different model types and/or different hyperparameters) and also select the best model from this set. The algorithm actively samples points for training that are most likely to improve the accuracy of the more promising candidate models, and also samples points for model selection---all samples count against the same labeling budget. This exposes a natural trade-off between the focused active sampling that is most effective for training models, and the unbiased sampling that is better for model selection. We empirically demonstrate on six test problems that this algorithm is nearly as effective as an active learning oracle that knows the optimal model in advance.

active learning, artificial intelligence, machine learning, (20 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.14)
North America > United States > Virginia (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback