Government
Obtaining Calibrated Probabilities from Boosting
Niculescu-Mizil, Alexandru, Caruana, Richard A.
Boosted decision trees typically yield good accuracy, precision, and ROC area. However, because the outputs from boosting are not well calibrated posterior probabilities, boosting yields poor squared error and cross-entropy. We empirically demonstrate why AdaBoost predicts distorted probabilities and examine three calibration methods for correcting this distortion: Platt Scaling, Isotonic Regression, and Logistic Correction. We also experiment with boosting using log-loss instead of the usual exponential loss. Experiments show that Logistic Correction and boosting with log-loss work well when boosting weak models such as decision stumps, but yield poor performance when boosting more complex models such as full decision trees. Platt Scaling and Isotonic Regression, however, significantly improve the probabilities predicted by
Aggregating Content and Network Information to Curate Twitter User Lists
Greene, Derek, Sheridan, Gavin, Smyth, Barry, Cunningham, Pรกdraig
Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of these lists is important - they should contain the key information gatekeepers and present a balanced perspective on a story. Here we address this list curation process from a recommender systems perspective. We propose a variety of criteria for generating user list recommendations, based on content analysis, network analysis, and the "crowdsourcing" of existing user lists. We demonstrate that these types of criteria are often only successful for datasets with certain characteristics. To resolve this issue, we propose the aggregation of these different "views" of a news story on Twitter to produce more accurate user recommendations to support the curation process.
Emerging Applications for Intelligent Diabetes Management
Marling, Cindy (Ohio University) | Wiley, Matthew (University of California, Riverside) | Bunescu, Razvan (Ohio University) | Shubrook, Jay (Ohion University) | Schwartz, Frank (Ohio University)
Diabetes management is a difficult task for patients, who must monitor and control their blood glucose levels in order to avoid serious diabetic complications. It is a difficult task for physicians, who must manually interpret large volumes of blood glucose data to tailor therapy to the needs of each patient. This paper describes three emerging applications that employ AI to ease this task: (1) case-based decision support for diabetes management; (2) machine learning classification of blood glucose plots; and (3) support vector regression for blood glucose prediction. The first application provides decision support by detecting blood glucose control problems and recommending therapeutic adjustments to correct them. The second provides an automated screen for excessive glycemic variability. The third aims to build a hypoglycemia predictor that could alert patients to dangerously low blood glucose levels in time to take preventive action. All are products of the 4 Diabetes Support SystemTM project, which uses AI to promote the health and wellbeing of people with type 1 diabetes. These emerging applications could potentially benefit 20 million patients who are at risk for devastating complications, thereby improving quality of life and reducing health care cost expenditures.
Learning by Demonstration for a Collaborative Planning Environment
Myers, Karen (SRI International) | Kolojejchic, Jake (General Dynamics C4 Systems | Viz) | Angiolillo, Carl (General Dynamics C4 Systems | Viz) | Cummings, Tim (General Dynamics C4 Systems | Viz) | Garvey, Tom (SRI International) | Gaston, Matt (Carnegie Mellon University) | Gervasio, Melinda (SRI International) | Haines, Will (SRI International) | Jones, Chris (SRI International) | Keifer, Kellie (SRI International) | Knittel, Janette (General Dynamics C4 Systems | Viz) | Morley, David (SRI International) | Ommert, William (General Dynamics C4 Systems | Viz) | Potter, Scott (General Dynamics C4 Systems | Viz)
Learning by demonstration technology has long held the promise to empower non-programmers to customize and extend software. We describe the deployment of a learning by demonstration capability to support user creation of automated procedures in a collaborative planning environment that is used widely by the U.S. Army. This technology, which has been in operational use since the summer of 2010, has helped to reduce user workloads by automating repetitive and time-consuming tasks. The technology has also provided the unexpected benefit of enabling standardization of products and processes.
Single parameter galaxy classification: The Principal Curve through the multi-dimensional space of galaxy properties
Taghizadeh-Popp, M., Heinis, S., Szalay, A. S.
We propose to describe the variety of galaxies from SDSS by using only one affine parameter. To this aim, we build the Principal Curve (P-curve) passing through the spine of the data point cloud, considering the eigenspace derived from Principal Component Analysis of morphological, physical and photometric galaxy properties. Thus, galaxies can be labeled, ranked and classified by a single arc length value of the curve, measured at the unique closest projection of the data points on the P-curve. We find that the P-curve has a "W" letter shape with 3 turning points, defining 4 branches that represent distinct galaxy populations. This behavior is controlled mainly by 2 properties, namely u-r and SFR. We further present the variations of several galaxy properties as a function of arc length. Luminosity functions variate from steep Schechter fits at low arc length, to double power law and ending in Log-normal fits at high arc length. Galaxy clustering shows increasing autocorrelation power at large scales as arc length increases. PCA analysis allowed to find peculiar galaxy populations located apart from the main cloud of data points, such as small red galaxies dominated by a disk, of relatively high stellar mass-to-light ratio and surface mass density. The P-curve allows not only dimensionality reduction, but also provides supporting evidence for relevant physical models and scenarios in extragalactic astronomy: 1) Evidence for the hierarchical merging scenario in the formation of a selected group of red massive galaxies. These galaxies present a log-normal r-band luminosity function, which might arise from multiplicative processes involved in this scenario. 2) Connection between the onset of AGN activity and star formation quenching, which appears in green galaxies when transitioning from blue to red populations. (Full abstract in downloadable version)
Propagation of Delays in the National Airspace System
Laskey, Kathryn Blackmond, Xu, Ning, Chen, Chun-Hung
The National Airspace System (NAS) is a large and complex system with thousands of interrelated components: administration, control centers, airports, airlines, aircraft, passengers, etc. The complexity of the NAS creates many difficulties in management and control. One of the most pressing problems is flight delay. Delay creates high cost to airlines, complaints from passengers, and difficulties for airport operations. As demand on the system increases, the delay problem becomes more and more prominent. For this reason, it is essential for the Federal Aviation Administration to understand the causes of delay and to find ways to reduce delay. Major contributing factors to delay are congestion at the origin airport, weather, increasing demand, and air traffic management (ATM) decisions such as the Ground Delay Programs (GDP). Delay is an inherently stochastic phenomenon. Even if all known causal factors could be accounted for, macro-level national airspace system (NAS) delays could not be predicted with certainty from micro-level aircraft information. This paper presents a stochastic model that uses Bayesian Networks (BNs) to model the relationships among different components of aircraft delay and the causal factors that affect delays. A case study on delays of departure flights from Chicago O'Hare international airport (ORD) to Hartsfield-Jackson Atlanta International Airport (ATL) reveals how local and system level environmental and human-caused factors combine to affect components of delay, and how these components contribute to the final arrival delay at the destination airport.
Flexible Modeling of Latent Task Structures in Multitask Learning
Passos, Alexandre, Rai, Piyush, Wainer, Jacques, Daume, Hal III
Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks. The nonparametric aspect makes the model expressive enough to subsume many existing models of latent task structures (e.g, mean-regularized tasks, clustered tasks, low-rank or linear/non-linear subspace assumption on tasks, etc.). Moreover, it can also learn more general task structures, addressing the shortcomings of such models. We present a variational inference algorithm for our model. Experimental results on synthetic and real-world datasets, on both regression and classification problems, demonstrate the effectiveness of the proposed method.
Inferring Latent Structure From Mixed Real and Categorical Relational Data
Salazar, Esther, Cain, Matthew, Darling, Elise, Mitroff, Stephen, Carin, Lawrence
We consider analysis of relational data (a matrix), in which the rows correspond to subjects (e.g., people) and the columns correspond to attributes. The elements of the matrix may be a mix of real and categorical. Each subject and attribute is characterized by a latent binary feature vector, and an inferred matrix maps each row-column pair of binary feature vectors to an observed matrix element. The latent binary features of the rows are modeled via a multivariate Gaussian distribution with low-rank covariance matrix, and the Gaussian random variables are mapped to latent binary features via a probit link. The same type construction is applied jointly to the columns. The model infers latent, low-dimensional binary features associated with each row and each column, as well correlation structure between all rows and between all columns.
A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound
Ji, Ming, Yang, Tianbao, Lin, Binbin, Jin, Rong, Han, Jiawei
In this work, we develop a simple algorithm for semi-supervised regression. The key idea is to use the top eigenfunctions of integral operator derived from both labeled and unlabeled examples as the basis functions and learn the prediction function by a simple linear regression. We show that under appropriate assumptions about the integral operator, this approach is able to achieve an improved regression error bound better than existing bounds of supervised learning. We also verify the effectiveness of the proposed algorithm by an empirical study.
Communications Inspired Linear Discriminant Analysis
Chen, Minhua, Carson, William, Rodrigues, Miguel, Calderbank, Robert, Carin, Lawrence
We study the problem of supervised linear dimensionality reduction, taking an information-theoretic viewpoint. The linear projection matrix is designed by maximizing the mutual information between the projected signal and the class label (based on a Shannon entropy measure). By harnessing a recent theoretical result on the gradient of mutual information, the above optimization problem can be solved directly using gradient descent, without requiring simplification of the objective function. Theoretical analysis and empirical comparison are made between the proposed method and two closely related methods (Linear Discriminant Analysis and Information Discriminant Analysis), and comparisons are also made with a method in which Renyi entropy is used to define the mutual information (in this case the gradient may be computed simply, under a special parameter setting). Relative to these alternative approaches, the proposed method achieves promising results on real datasets.