AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

Bellet, Aurélien, Liang, Yingyu, Garakani, Alireza Bagheri, Balcan, Maria-Florina, Sha, Fei

arXiv.org Machine LearningJan-12-2015

Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error $\epsilon$ and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an $\epsilon$-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1404.2644

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

SPRITE: A Response Model For Multiple Choice Testing

Ning, Ryan, Waters, Andrew E., Studer, Christoph, Baraniuk, Richard G.

arXiv.org Machine LearningJan-12-2015

Item response theory (IRT) models for categorical response data are widely used in the analysis of educational data, computerized adaptive testing, and psychological surveys. However, most IRT models rely on both the assumption that categories are strictly ordered and the assumption that this ordering is known a priori. These assumptions are impractical in many real-world scenarios, such as multiple-choice exams where the levels of incorrectness for the distractor categories are often unknown. While a number of results exist on IRT models for unordered categorical data, they tend to have restrictive modeling assumptions that lead to poor data fitting performance in practice. Furthermore, existing unordered categorical models have parameters that are difficult to interpret. In this work, we propose a novel methodology for unordered categorical IRT that we call SPRITE (short for stochastic polytomous response item model) that: (i) analyzes both ordered and unordered categories, (ii) offers interpretable outputs, and (iii) provides improved data fitting compared to existing models. We compare SPRITE to existing item response models and demonstrate its efficacy on both synthetic and real-world educational datasets.

artificial intelligence, category, machine learning, (17 more...)

arXiv.org Machine Learning

1501.02844

Country: South America > Brazil (0.46)

Genre: Research Report (0.64)

Industry: Education (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Modeling & Simulation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Entropic one-class classifiers

Livi, Lorenzo, Sadeghian, Alireza, Pedrycz, Witold

arXiv.org Machine LearningJan-11-2015

The one-class classification problem is a well-known research endeavor in pattern recognition. The problem is also known under different names, such as outlier and novelty/anomaly detection. The core of the problem consists in modeling and recognizing patterns belonging only to a so-called target class. All other patterns are termed non-target, and therefore they should be recognized as such. In this paper, we propose a novel one-class classification system that is based on an interplay of different techniques. Primarily, we follow a dissimilarity representation based approach; we embed the input data into the dissimilarity space by means of an appropriate parametric dissimilarity measure. This step allows us to process virtually any type of data. The dissimilarity vectors are then represented through a weighted Euclidean graphs, which we use to (i) determine the entropy of the data distribution in the dissimilarity space, and at the same time (ii) derive effective decision regions that are modeled as clusters of vertices. Since the dissimilarity measure for the input data is parametric, we optimize its parameters by means of a global optimization scheme, which considers both mesoscopic and structural characteristics of the data represented through the graphs. The proposed one-class classifier is designed to provide both hard (Boolean) and soft decisions about the recognition of test patterns, allowing an accurate description of the classification process. We evaluate the performance of the system on different benchmarking datasets, containing either feature-based or structured patterns. Experimental results demonstrate the effectiveness of the proposed technique.

data mining, machine learning, pattern recognition, (21 more...)

arXiv.org Machine Learning

doi: 10.1109/TNNLS.2015.2418332

1407.7556

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

Add feedback

Learning the Conditional Independence Structure of Stationary Time Series: A Multitask Learning Approach

Jung, Alexander

arXiv.org Machine LearningJan-11-2015

E consider a stationary discrete-time vector process or time series. Such a process could model, e.g., the time evolution of air pollutant concentrations [1], [2] or medical diagnostic data obtained in electrocorticography (ECoG) [3]. One specific way of representing the dependence structure of a vector process is via a graphical model [4], where the nodes of the graph represent the individual scalar process components, and the edges represent statistical relations between the individual process components. More precisely, the (undirected) edges of a conditional independence graph (CIG) associated with a process represent conditional independence statements about the process components [4], [1]. In particular, two nodes in the CIG are connected by an edge if and only if the two corresponding process components are conditionally dependent, given the remaining process components. Note that the so defined CIG for time series extends the basic notion of a CIG for random vectors by considering dependencies between entire time series instead of dependencies between scalar random variables [5], [6]. In this work, we investigate the problem of graphical model selection (GMS), i.e., that of inferring the CIG of a time series, given a finite-length observation. A. Jung is with the Institute of Telecommunications, Vienna University of Technology, 1040-Vienna, Austria email: ajung@nt.tuwien.ac.at.

algorithm 1, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2015.2460219

1404.1361

Country:

Europe > Austria > Vienna (0.74)
North America > United States (0.67)

Genre: Research Report (0.64)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Techniques for clustering interaction data as a collection of graphs

Lee, Nam H., Priebe, Carey, Park, Youngser, Wang, I-Jeng, Rosen, Michael

arXiv.org Machine LearningJan-10-2015

A natural approach to analyze interaction data of form "what-connects-to-what-when" is to create a time-series (or rather a sequence) of graphs through temporal discretization (bandwidth selection) and spatial discretization (vertex contraction). Such discretization together with non-negative factorization techniques can be useful for obtaining clustering of graphs. Motivating application of performing clustering of graphs (as opposed to vertex clustering) can be found in neuroscience and in social network analysis, and it can also be used to enhance community detection (i.e., vertex clustering) by way of conditioning on the cluster labels. In this paper, we formulate a problem of clustering of graphs as a model selection problem. Our approach involves information criteria, non-negative matrix factorization and singular value thresholding, and we illustrate our techniques using real and simulated data.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Machine Learning

1406.6319

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Services (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Survey schemes for stochastic gradient descent with applications to M-estimation

Clémençon, Stéphan, Bertail, Patrice, Chautru, Emilie, Papa, Guillaume

arXiv.org Machine LearningJan-9-2015

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the "full data" statistics with their counterparts based on the resulting random samples, of manageable size. It is the main purpose of this paper to investigate the impact of survey sampling with unequal inclusion probabilities on stochastic gradient descent-based M-estimation methods in large-scale statistical and machine-learning problems. Precisely, we prove that, in presence of some a priori information, one may significantly increase asymptotic accuracy when choosing appropriate first order inclusion probabilities, without affecting complexity. These striking results are described here by limit theorems and are also illustrated by numerical experiments.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1501.02218

Country: Europe (0.46)

Genre: Research Report (0.65)

Industry: Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Exploring Sparsity in Multi-class Linear Discriminant Analysis

Xia, Dong

arXiv.org Machine LearningJan-9-2015

Recent studies in the literature have paid much attention to the sparsity in linear classification tasks. One motivation of imposing sparsity assumption on the linear discriminant direction is to rule out the noninformative features, making hardly contribution to the classification problem. Most of those work were focused on the scenarios of binary classification, such as Fan et al. (2012), Cai and Liu (2011) and Mai et al. (2012). In the presence of multi-class data, preceding researches recommended individually pairwise sparse linear discriminant analysis(LDA), such as Cai and Liu(2011),Fan et al.(2012). However, further sparsity should be explored. In this paper, an estimator of grouped LASSO type is proposed to take advantage of sparsity for multi-class data. It enjoys appealing non-asymptotic properties which allows insignificant correlations among features. This estimator exhibits superior capability on both simulated and real data.

artificial intelligence, estimator, machine learning, (14 more...)

arXiv.org Machine Learning

1412.7983

Genre: Research Report (0.70)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.81)

Add feedback

Co-clustering for directed graphs: the Stochastic co-Blockmodel and spectral algorithm Di-Sim

Rohe, Karl, Qin, Tai, Yu, Bin

arXiv.org Machine LearningJan-8-2015

Directed graphs have asymmetric connections, yet the current graph clustering methodologies cannot identify the potentially global structure of these asymmetries. We give a spectral algorithm called di-sim that builds on a dual measure of similarity that correspond to how a node (i) sends and (ii) receives edges. Using di-sim, we analyze the global asymmetries in the networks of Enron emails, political blogs, and the c elegans neural connectome. In each example, a small subset of nodes have persistent asymmetries; these nodes send edges with one cluster, but receive edges with another cluster. Previous approaches would have assigned these asymmetric nodes to only one cluster, failing to identify their sending/receiving asymmetries. Regularization and "projection" are two steps of di-sim that are essential for spectral clustering algorithms to work in practice. The theoretical results show that these steps make the algorithm weakly consistent under the degree corrected Stochastic co-Blockmodel, a model that generalizes the Stochastic Blockmodel to allow for both (i) degree heterogeneity and (ii) the global asymmetries that we intend to detect. The theoretical results make no assumptions on the smallest degree nodes. Instead, the theorem requires that the average degree grows sufficiently fast and that the weak consistency only applies to the subset of the nodes with sufficiently large leverage scores. The results results also apply to bipartite graphs.

data mining, machine learning, node, (20 more...)

arXiv.org Machine Learning

1204.2296

Country: North America > United States > California (0.28)

Genre: Research Report (0.83)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Energy > Power Industry (0.48)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
(2 more...)

Add feedback

An Introduction to Matrix Concentration Inequalities

Tropp, Joel A.

arXiv.org Machine LearningJan-7-2015

In recent years, random matrices have come to play a major role in computational mathematics, but most of the classical areas of random matrix theory remain the province of experts. Over the last decade, with the advent of matrix concentration inequalities, research has advanced to the point where we can conquer many (formerly) challenging problems with a page or two of arithmetic. My aim is to describe the most successful methods from this area along with some interesting examples that these techniques can illuminate. I hope that the results in these pages will inspire future work on applications of random matrices as well as refinements of the matrix concentration inequalities discussed herein. I have chosen to present a coherent body of results based on a generalization of the Laplace transform method for establishing scalar concentration inequalities. In the last two years, Lester Mackey and I, together with our coauthors, have developed an alternative approach to matrix concentration using exchangeable pairs and Markov chain couplings. With some regret, I have chosen to omit this theory because the ideas seem less accessible to a broad audience of researchers. The interested reader will find pointers to these articles in the annotated bibliography. The work described in these notes reflects the influence of many researchers.

artificial intelligence, machine learning, survey article, (20 more...)

arXiv.org Machine Learning

1501.01571

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Overview (1.00)
Summary/Review (0.87)
Research Report > New Finding (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

An Effective Semi-supervised Divisive Clustering Algorithm

Qiu, Teng, Li, Yongjie

arXiv.org Machine LearningJan-6-2015

Diverse experimental data ranging from microarray gene expression data in biology to spectrum data in astronomy require to be clustered to signal meaningful correlation of the data. Massive documents or images on internet are also needed to be effectively organized so as to promote the efficiency of search engines. Clustering method as K-means (1) is popular for its simplicity, yet sensitive to noise and initialization and thus is limited by the lack of reliability. Hierarchical clustering (HC) (2) is simple and intuitive and thus widely used especially in biology (3), whereas it needs a large computation (4) and its result is variable to a set of similarity measures between clusters. Moreover, the cluster number for the above methods needs to be prespecified (e.g., K-means) or determined by a threshold (e.g., HC). Some other well-known algorithms either involve complex optimization and postprocessing (5), or have limited range of applications such as the distribution (6) or the attribute of data (7, 8). Although affinity propagation (AP) (9) has much better performance than K-means and the cluster number is determined automatically, it is not good at detecting nonspherical clusters (10). Recently, two effective clustering algorithms (10, 11) were proposed, which can together form a pool of clustering methods based on the in-tree structure (11). But they involve a free parameter.

artificial intelligence, machine learning, node, (15 more...)

arXiv.org Machine Learning

1412.7625

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback