AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Towards Property-Based Classification of Clustering Paradigms

Ackerman, Margareta, Ben-David, Shai, Loker, David

Neural Information Processing SystemsDec-31-2010

Clustering is a basic data mining task with a wide variety of applications. Not surprisingly, there exist many clustering algorithms. However, clustering is an ill defined problem - given a data set, it is not clear what a “correct” clustering for that set is. Indeed, different algorithms may yield dramatically different outputs for the same input sets. Faced with a concrete clustering task, a user needs to choose an appropriate clustering algorithm. Currently, such decisions are often made in a very ad hoc, if not completely random, manner. Given the crucial effect of the choice of a clustering algorithm on the resulting clustering, this state of affairs is truly regrettable. In this paper we address the major research challenge of developing tools for helping users make more informed decisions when they come to pick a clustering tool for their data. This is, of course, a very ambitious endeavor, and in this paper, we make some first steps towards this goal. We propose to address this problem by distilling abstract properties of the input-output behavior of different clustering paradigms. In this paper, we demonstrate how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms. On top of addressing deterministic clustering algorithms, we also propose similar properties for randomized algorithms and use them to highlight functional differences between different common implementations of k-means clustering. We also study relationships between the properties, independent of any particular algorithm. In particular, we strengthen Kleinbergs famous impossibility result, while providing a simpler proof.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Generalised Wishart Processes

Wilson, Andrew Gordon, Ghahramani, Zoubin

arXiv.org Machine LearningDec-31-2010

We introduce a stochastic process with Wishart marginals: the generalised Wishart process (GWP). It is a collection of positive semi-definite random matrices indexed by any arbitrary dependent variable. We use it to model dynamic (e.g. time varying) covariance matrices. Unlike existing models, it can capture a diverse class of covariance structures, it can easily handle missing data, the dependent variable can readily include covariates other than time, and it scales well with dimension; there is no need for free parameters, and optional parameters are easy to interpret. We describe how to construct the GWP, introduce general procedures for inference and predictions, and show that it outperforms its main competitor, multivariate GARCH, even on financial data that especially suits GARCH. We also show how to predict the mean of a multivariate process while accounting for dynamic correlations.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1101.024

Country: Europe > United Kingdom (0.28)

Genre:

Research Report (0.50)
Personal > Honors (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Robust PCA via Outlier Pursuit

Xu, Huan, Caramanis, Constantine, Sanghavi, Sujay

arXiv.org Machine LearningDec-31-2010

Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself. In any problem where one seeks to recover a structure rather than the exact initial matrices, techniques developed thus far relying on certificates of optimality, will fail. We present an important extension of these methods, that allows the treatment of such problems.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1010.4237

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Software Effort Estimation with Ridge Regression and Evolutionary Attribute Selection

Papatheocharous, Efi, Papadopoulos, Harris, Andreou, Andreas S.

arXiv.org Artificial IntelligenceDec-28-2010

Software cost estimation is one of the prerequisite managerial activities carried out at the software development initiation stages and also repeated throughout the whole software life-cycle so that amendments to the total cost are made. In software cost estimation typically, a selection of project attributes is employed to produce effort estimations of the expected human resources to deliver a software product. However, choosing the appropriate project cost drivers in each case requires a lot of experience and knowledge on behalf of the project manager which can only be obtained through years of software engineering practice. A number of studies indicate that popular methods applied in the literature for software cost estimation, such as linear regression, are not robust enough and do not yield accurate predictions. Recently the dual variables Ridge Regression (RR) technique has been used for effort estimation yielding promising results. In this work we show that results may be further improved if an AI method is used to automatically select appropriate project cost drivers (inputs) for the technique. We propose a hybrid approach combining RR with a Genetic Algorithm, the latter evolving the subset of attributes for approximating effort more accurately. The proposed hybrid cost model has been applied on a widely known high-dimensional dataset of software project samples and the results obtained show that accuracy may be increased if redundant attributes are eliminated.

artificial intelligence, cost driver, machine learning, (12 more...)

arXiv.org Artificial Intelligence

1012.5754

Country:

North America > United States (0.69)
Europe > Middle East > Cyprus (0.29)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)

Add feedback

Intrusion Detection using Continuous Time Bayesian Networks

Xu, J., Shelton, C. R.

Journal of Artificial Intelligence ResearchDec-23-2010

Intrusion detection systems (IDSs) fall into two high-level categories: network-based systems (NIDS) that monitor network behaviors, and host-based systems (HIDS) that monitor system calls. In this work, we present a general technique for both systems. We use anomaly detection, which identifies patterns not conforming to a historic norm. In both types of systems, the rates of change vary dramatically over time (due to burstiness) and over components (due to service difference). To efficiently model such systems, we use continuous time Bayesian networks (CTBNs) and avoid specifying a fixed update interval common to discrete-time models. We build generative models from the normal training data, and abnormal behaviors are flagged based on their likelihood under this norm. For NIDS, we construct a hierarchical CTBN model for the network packet traces and use Rao-Blackwellized particle filtering to learn the parameters. We illustrate the power of our method through experiments on detecting real worms and identifying hosts on two publicly available network traces, the MAWI dataset and the LBNL dataset. For HIDS, we develop a novel learning method to deal with the finite resolution of system log file time stamps, without losing the benefits of our continuous time model. We demonstrate the method by detecting intrusions in the DARPA 1998 BSM dataset.

algorithm, ctbn, traffic, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3050

AI Access Foundation

10681

Journal of Artificial Intelligence Research

Country:

North America > United States > California > Riverside County > Riverside (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Regularized Least-Mean-Square Algorithms

Chen, Yilun, Gu, Yuantao, Hero, Alfred O.

arXiv.org Machine LearningDec-23-2010

We consider adaptive system identification problems with convex constraints and propose a family of regularized Least-Mean-Square (LMS) algorithms. We show that with a properly selected regularization parameter the regularized LMS provably dominates its conventional counterpart in terms of mean square deviations. We establish simple and closed-form expressions for choosing this regularization parameter. For identifying an unknown sparse system we propose sparse and group-sparse LMS algorithms, which are special examples of the regularized LMS family. Simulation results demonstrate the advantages of the proposed filters in both convergence rate and steady-state error under sparsity assumptions on the true coefficient vector.

artificial intelligence, input signal, machine learning, (16 more...)

arXiv.org Machine Learning

1012.5066

Country: North America > United States > Michigan (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Split Bregman Method for Sparse Inverse Covariance Estimation with Matrix Iteration Acceleration

Ye, Gui-Bo, Cai, Jian-Feng, Xie, Xiaohui

arXiv.org Machine LearningDec-23-2010

Abstract: We consider the problem of estimating the inverse covariance matrix by maximizing the likelihood function with a penalty added to encourage the sparsity of the resulting matrix. We propose a new approach based on the split Bregman method to solve the regularized maximum likelihood estimation problem. We show that our method is significantly faster than the widely used graphical lasso method, which is based on blockwise coordinate descent, on both artificial and real-world data.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1012.0975

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Add feedback

Estimating Networks With Jumps

Kolar, Mladen, Xing, Eric P.

arXiv.org Machine LearningDec-20-2010

We study the problem of estimating a temporally varying coefficient and varying structure (VCVS) graphical model underlying nonstationary time series data, such as social states of interacting individuals or microarray expression profiles of gene networks, as opposed to i.i.d. data from an invariant model widely considered in current literature of structural estimation. In particular, we consider the scenario in which the model evolves in a piece-wise constant fashion. We propose a procedure that minimizes the so-called TESLA loss (i.e., temporally smoothed L1 regularized regression), which allows jointly estimating the partition boundaries of the VCVS model and the coefficient of the sparse precision matrix on each block of the partition. A highly scalable proximal gradient method is proposed to solve the resultant convex optimization problem; and the conditions for sparsistent estimation and the convergence rate of both the partition boundaries and the network structure are established for the first time for such estimators.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1012.3795

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: A Sure Screening Approach

Kolar, Mladen, Xing, Eric P.

arXiv.org Machine LearningDec-17-2010

We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure for sparsistant variable selection in ultra-high dimensional multi-task regression problems. Screening of variables, as introduced in \cite{fan08sis}, is an efficient and highly scalable way to remove many irrelevant variables from the set of all variables, while retaining all the relevant variables. S-OMP can be applied to problems with hundreds of thousands of variables and once the number of variables is reduced to a manageable size, a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. To our knowledge, this is the first attempt to utilize relatedness of multiple outputs to perform fast screening of relevant variables. As our main theoretical contribution, we prove that, asymptotically, S-OMP is guaranteed to reduce an ultra-high number of variables to below the sample size without losing true relevant variables. We also provide formal evidence that a modified Bayesian information criterion (BIC) can be used to efficiently determine the number of iterations in S-OMP. We further provide empirical evidence on the benefit of variable selection using multiple regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of S-OMP is demonstrated on extensive simulation studies, and on a genetic association mapping problem. $Keywords$ Adaptive Lasso; Greedy forward regression; Orthogonal matching pursuit; Multi-output regression; Multi-task learning; Simultaneous orthogonal matching pursuit; Sure screening; Variable selection

artificial intelligence, machine learning, omp 100, (14 more...)

arXiv.org Machine Learning

1012.388

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Descriptive-complexity based distance for fuzzy sets

Kovacs, Laszlo, Ratsaby, Joel

arXiv.org Artificial IntelligenceDec-15-2010

The notion of distance between two objects is very general. Distance metrics and distances have now become an essential tool in many areas of mathematics and its applications including geometry, probability, statistics, coding/graph theory, data analysis, pattern recognition. For a comprehensive source on this subject see [4]. The notion of a fuzzy set was introduced by [8]. It is a class of objects with continuous values of membership and hence extends the classical definition of a set (to distinguish it from a fuzzy set we refer to it as a crisp set).

artificial intelligence, machine learning, membership function, (19 more...)

arXiv.org Artificial Intelligence

1012.341

Country: Europe > Hungary > Borsod-Abaúj-Zemplén County > Miskolc (0.14)

Genre: Research Report (0.40)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback