Goto

Collaborating Authors

 Performance Analysis


An Investigation of Sensitivity on Bagging Predictors: An Empirical Approach

AAAI Conferences

As growing numbers of real world applications involve imbalanced class distribution or unequal costs for mis- classification errors in different classes, learning from imbalanced class distribution is considered to be one of the most challenging issues in data mining research. This study empirically investigates the sensitivity of bagging predictors with respect to 12 algorithms and 9 levels of class distribution on 14 imbalanced data-sets by using statistical and graphical methods to address the important issue of understanding the effect of vary- ing levels of class distribution on bagging predictors. The experimental results demonstrate that bagging NB and MLP are insensitive to various levels of imbalanced class distribution.


Performance and Preferences: Interactive Refinement of Machine Learning Procedures

AAAI Conferences

Problem-solving procedures have been typically aimed at achieving well-defined goals or satisfying straightforward preferences. However, learners and solvers may often generate rich multiattribute results with procedures guided by sets of controls that define different dimensions of quality. We explore methods that enable people to explore and express preferences about the operation of classification models in supervised multiclass learning. We leverage a leave-one-out confusion matrix that provides users with views and real-time controls of a model space. The approach allows people to consider in an interactive manner the global implications of local changes in decision boundaries. We focus on kernel classifiers and show the effectiveness of the methodology on a variety of tasks.


Investigating the Effectiveness of Laplacian-Based Kernels in Hub Reduction

AAAI Conferences

A “hub” is an object closely surrounded by, or very similar to, many other objects in the dataset. Recent studies by Radovanovi´c et al. indicate that in high dimensional spaces, hubs almost always emerge, and objects close to the data centroid tend to become hubs. In this paper, we show that the family of kernels based on the graph Laplacian makes all objects in the dataset equally similar to the centroid, and thus they are expected to make less hubs when used as a similarity measure. We investigate this hypothesis using both synthetic and real-world data. It turns out that these kernels suppress hubs in some cases but not always, and the results seem to be affected by the size of the data—a factor not discussed previously. However, for the datasets in which hubs are indeed reduced by the Laplacian-based kernels, these kernels work well in ranking and classification tasks. This result suggests that the amount of hubs, which can be readily computed in an unsupervised fashion, can be a yardstick of whether Laplacian-based kernels work effectively for a given data.


Predicting Satisfiability at the Phase Transition

AAAI Conferences

Uniform random 3-SAT at the solubility phase transition is one of the most widely studied and empirically hardest distributions of SAT instances. For 20 years, this distribution has been used extensively for evaluating and comparing algorithms. In this work, we demonstrate that simple rules can predict the solubility of these instances with surprisingly high accuracy. Specifically, we show how classification accuracies of about 70% can be obtained based on cheaply (polynomial-time) computable features on a wide range of instance sizes. We argue in two ways that classification accuracy does not decrease with instance size: first, we show that our models' predictive accuracy remains roughly constant across a wide range of problem sizes; second, we show that a classifier trained on small instances is sufficient to achieve very accurate predictions across the entire range of instance sizes currently solvable by complete methods. Finally, we demonstrate that a simple decision tree based on only two features, and again trained only on the smallest instances, achieves predictive accuracies close to those of our most complex model. We conjecture that this two-feature model outperforms random guessing asymptotically; due to the model's extreme simplicity, we believe that this conjecture is a worthwhile direction for future theoretical work.


Coupling Spatiotemporal Disease Modeling with Diagnosis

AAAI Conferences

Modelling the density of an infectious disease in space and time is a task generally carried out separately from the diagnosis of that disease in individuals. These two inference problems are complementary, however: diagnosis of disease can be done more accurately if prior information from a spatial risk model is employed, and in turn a disease density model can benefit from the incorporation of rich symptomatic information rather than simple counts of presumed cases of infection. We propose a unifying framework for both of these tasks, and illustrate it with the case of malaria. To do this we first introduce a state space model of malaria spread, and secondly a computer vision based system for detecting plasmodium in microscopical blood smear images, which can be run on location-aware mobile devices. We demonstrate the tractability of combining both elements and the improvement in accuracy this brings about.


Fine-Grained Photovoltaic Output Prediction Using a Bayesian Ensemble

AAAI Conferences

Local and distributed power generation is increasingly relianton renewable power sources, e.g., solar (photovoltaic or PV) andwind energy. The integration of such sources into the power grid ischallenging, however, due to their variable and intermittent energyoutput. To effectively use them on alarge scale, it is essential to be able to predict power generation at afine-grained level. We describe a novel Bayesian ensemble methodologyinvolving three diverse predictors. Each predictor estimates mixingcoefficients for integrating PV generation output profiles but capturesfundamentally different characteristics. Two of them employ classicalparameterized (naive Bayes) and non-parametric (nearest neighbor) methods tomodel the relationship between weather forecasts and PV output. The thirdpredictor captures the sequentiality implicit in PV generation and uses motifsmined from historical data to estimate the most likely mixture weights usinga stream prediction methodology. We demonstrate the success and superiority of ourmethods on real PV data from two locations that exhibit diverse weatherconditions. Predictions from our model can be harnessed to optimize schedulingof delay tolerant workloads, e.g., in a data center.


Functional Interactions Between Memory and Recognition Judgments

AAAI Conferences

One issue facing agents that accumulate large bodies of knowledge is determining whether they have knowl- edge that is relevant to its current goals. Performing comprehensive searches of long-term memory in every situation can be computationally expensive and disrup- tive to task reasoning. In this paper, we demonstrate that the recognition judgment — a heuristic for whether memory structures have been previously perceived — can serve as a low-cost indicator of the existence of potentially relevant knowledge. We present an approach for computing both context-dependent and context- independent recognition judgments using processes and data shared with declarative memories. We then de- scribe an initial, efficient implementation in the Soar cognitive architecture and evaluate our system in a word sense disambiguation task, showing that it reduces the number of memory searches without degrading agent performance.


Ontological Smoothing for Relation Extraction with Minimal Supervision

AAAI Conferences

Relation extraction, the process of converting natural language text into structured knowledge, is increasingly important. Most successful techniques use supervised machine learning to generate extractors from sentences that have been manually labeled with the relations' arguments. Unfortunately, these methods require numerous training examples, which are expensive and time-consuming to produce. This paper presents ontological smoothing, a semi-supervisedtechnique that learns extractors for a set of minimally-labeledrelations. Ontological smoothing has three phases. First, itgenerates a mapping between the target relations and a backgroundknowledge-base. Second, it uses distant supervision toheuristically generate new training examples for the targetrelations. Finally, it learns an extractor from a combination of theoriginal and newly-generated examples. Experiments on 65 relationsacross three target domains show that ontological smoothing candramatically improve precision and recall, even rivaling fully supervisedperformance in many cases.


Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

arXiv.org Artificial Intelligence

Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning.


Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

arXiv.org Machine Learning

We consider probabilistic multinomial probit classification using Gaussian process (GP) priors. The challenges with the multiclass GP classification are the integration over the non-Gaussian posterior distribution, and the increase of the number of unknown latent variables as the number of target classes grows. Expectation propagation (EP) has proven to be a very accurate method for approximate inference but the existing EP approaches for the multinomial probit GP classification rely on numerical quadratures or independence assumptions between the latent values from different classes to facilitate the computations. In this paper, we propose a novel nested EP approach which does not require numerical quadratures, and approximates accurately all between-class posterior dependencies of the latent values, but still scales linearly in the number of classes. The predictive accuracy of the nested EP approach is compared to Laplace, variational Bayes, and Markov chain Monte Carlo (MCMC) approximations with various benchmark data sets. In the experiments nested EP was the most consistent method with respect to MCMC sampling, but the differences between the compared methods were small if only the classification accuracy is concerned.