Carbonell, Jaime G.


The Nonlinearity Coefficient - Predicting Overfitting in Deep Neural Networks

arXiv.org Machine Learning

For a long time, designing neural architectures that exhibit high performance was considered a dark art that required expert hand-tuning. One of the few well-known guidelines for architecture design is the avoidance of exploding gradients, though even this guideline has remained relatively vague and circumstantial. We introduce the nonlinearity coefficient (NLC), a measurement of the complexity of the function computed by a neural network that is based on the magnitude of the gradient. Via an extensive empirical study, we show that the NLC is a powerful predictor of test error and that attaining a right-sized NLC is essential for optimal performance. The NLC exhibits a range of intriguing and important properties. It is closely tied to the amount of information gained from computing a single network gradient. It is tied to the error incurred when replacing the nonlinearity operations in the network with linear operations. It is not susceptible to the confounders of multiplicative scaling, additive bias and layer width. It is stable from layer to layer. Hence, we argue that the NLC is the first robust predictor of overfitting in deep networks.


Smoothing Proximal Gradient Method for General Structured Sparse Learning

arXiv.org Machine Learning

We study the problem of learning high dimensional regression models regularized by a structured-sparsity-inducing penalty that encodes prior structural information on either input or output sides. We consider two widely adopted types of such penalties as our motivating examples: 1) overlapping group lasso penalty, based on the l1/l2 mixed-norm penalty, and 2) graph-guided fusion penalty. For both types of penalties, due to their non-separability, developing an efficient optimization method has remained a challenging problem. In this paper, we propose a general optimization approach, called smoothing proximal gradient method, which can solve the structured sparse regression problems with a smooth convex loss and a wide spectrum of structured-sparsity-inducing penalties. Our approach is based on a general smoothing technique of Nesterov. It achieves a convergence rate faster than the standard first-order method, subgradient method, and is much more scalable than the most widely used interior-point method. Numerical results are reported to demonstrate the efficiency and scalability of the proposed method.


Learning Spatial-Temporal Varying Graphs with Applications to Climate Data Analysis

AAAI Conferences

An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed L 1 penalty based structure learning algorithm, has been proven successful for learning underlying dependency structures for the data drawn from a multivariate Gaussian distribution. However, climatological data often turn out to be non-Gaussian, e.g. cloud cover, precipitation, etc. In this paper, we examine nonparametric learning methods to address this challenge. In particular, we develop a methodology to learn dynamic graph structures from spatial-temporal data so that the graph structures at adjacent time or locations are similar. Experimental results demonstrate that our method not only recovers the underlying graph well but also captures the smooth variation properties on both synthetic data and climate data. An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed L 1 penalty based structure learning algorithm, has been proven successful for learning underlying dependency structures for the data drawn from a multivariate Gaussian distribution. However, climatological data often turn out to be non-Gaussian, e.g. cloud cover, precipitation, etc. In this paper, we examine nonparametric learning methods to address this challenge. In particular, we develop a methodology to learn dynamic graph structures from spatial-temporal data so that the graph structures at adjacent time or locations are similar. Experimental results demonstrate that our method not only recovers the underlying graph well but also captures the smooth variation properties on both synthetic data and climate data.


Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso

arXiv.org Machine Learning

We consider the problem of learning a structured multi-task regression, where the output consists of multiple responses that are related by a graph and the correlated response variables are dependent on the common inputs in a sparse but synergistic manner. Previous methods such as l1/l2-regularized multi-task regression assume that all of the output variables are equally related to the inputs, although in many real-world problems, outputs are related in a complex manner. In this paper, we propose graph-guided fused lasso (GFlasso) for structured multi-task regression that exploits the graph structure over the output variables. We introduce a novel penalty function based on fusion penalty to encourage highly correlated outputs to share a common set of relevant inputs. In addition, we propose a simple yet efficient proximal-gradient method for optimizing GFlasso that can also be applied to any optimization problems with a convex smooth loss and the general class of fusion penalty defined on arbitrary graph structures. By exploiting the structure of the non-smooth ''fusion penalty'', our method achieves a faster convergence rate than the standard first-order method, sub-gradient method, and is significantly more scalable than the widely adopted second-order cone-programming and quadratic-programming formulations. In addition, we provide an analysis of the consistency property of the GFlasso model. Experimental results not only demonstrate the superiority of GFlasso over the standard lasso but also show the efficiency and scalability of our proximal-gradient method.


Machine Learning: A Historical and Methodological Analysis

AI Magazine

Machine learning has always been an integral part of artificial intelligence, and its methodology has evolved in concert with the major concerns of the field. In response to the difficulties of encoding ever-increasing volumes of knowledge in modern AI systems, many researchers have recently turned their attention to machine learning as a means to overcome the knowledge acquisition bottleneck. This article presents a taxonomic analysis of machine learning organized primarily by learning strategies and secondarily by knowledge representation and application areas. A historical survey outlining the development of various approaches to machine learning is presented from early neural networks to present knowledge-intensive techniques.


Artificial Intelligence Techniques and Methodology

AI Magazine

Two closely related aspects of artificial intelligence that have received comparatively little attention in the recent literature are research methodology, and the analysis of computational techniques that span multiple application areas. We believe both issues to be increasingly significant as Artificial Intelligence matures into a science and spins off major application efforts. Similarly, awareness of research methodology issues can help plan future research buy learning from past successes and failures. We view the study of research methodology to be similar to the analysis of operational AI techniques, but at a meta-level; that is, research methodology analyzes the techniques and methods used by the researchers themselves, rather than their programs, to resolve issues of selecting interesting and tractable problems to investigate, and of deciding how to proceed with their investigations.


Artificial Intelligence Research at Carnegie-Mellon University

AI Magazine

AI research at CMU is closely integrated with other activities in the Computer Science Department, and to a major degree with ongoing research in the Psychology Department. Although there are over 50 faculty, staff and graduate students involved in various aspects of AI research, there is no administratively (or physically) separate AI laboratory. To underscore the interdisciplinary nature of our AI research, a significant fraction of the projects listed below are joint ventures between computer science and psychology.