AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Inference, Sampling, and Learning in Copula Cumulative Distribution Networks

arXiv.org Machine LearningOct-16-2013

The cumulative distribution network (CDN) is a recently developed class of probabilistic graphical models (PGMs) permitting a copula factorization, in which the CDF, rather than the density, is factored. Despite there being much recent interest within the machine learning community about copula representations, there has been scarce research into the CDN, its amalgamation with copula theory, and no evaluation of its performance. Algorithms for inference, sampling, and learning in these models are underdeveloped compared those of other PGMs, hindering widerspread use. One advantage of the CDN is that it allows the factors to be parameterized as copulae, combining the benefits of graphical models with those of copula theory. In brief, the use of a copula parameterization enables greater modelling flexibility by separating representation of the marginals from the dependence structure, permitting more efficient and robust learning. Another advantage is that the CDN permits the representation of implicit latent variables, whose parameterization and connectivity are not required to be specified. Unfortunately, that the model can encode only latent relationships between variables severely limits its utility. In this thesis, we present inference, learning, and sampling for CDNs, and further the state-of-the-art. First, we explain the basics of copula theory and the representation of copula CDNs. Then, we discuss inference in the models, and develop the first sampling algorithm. We explain standard learning methods, propose an algorithm for learning from data missing completely at random (MCAR), and develop a novel algorithm for learning models of arbitrary treewidth and size. Properties of the models and algorithms are investigated through Monte Carlo simulations. We conclude with further discussion of the advantages and limitations of CDNs, and suggest future work.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1310.4456

Country: Europe (0.27)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

Supervised Heterogeneous Multiview Learning for Joint Association Study and Disease Diagnosis

Zhe, Shandian, Xu, Zenglin, Qi, Yuan

arXiv.org Machine LearningOct-16-2013

Given genetic variations and various phenotypical traits, such as Magnetic Resonance Imaging (MRI) features, we consider two important and related tasks in biomedical research: i)to select genetic and phenotypical markers for disease diagnosis and ii) to identify associations between genetic and phenotypical data. These two tasks are tightly coupled because underlying associations between genetic variations and phenotypical features contain the biological basis for a disease. While a variety of sparse models have been applied for disease diagnosis and canonical correlation analysis and its extensions have bee widely used in association studies (e.g., eQTL analysis), these two tasks have been treated separately. To unify these two tasks, we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels based on Gaussian process ordinal regression; in return, the disease status is used to guide the discovery of relationships between the data sources. The sparse projection matrices not only reveal interactions between data sources but also select groups of biomarkers related to the disease. To learn the model from data, we develop an efficient variational expectation maximization algorithm. Simulation results demonstrate that our approach achieves higher accuracy in both predicting ordinal labels and discovering associations between data sources than alternative methods. We apply our approach to an imaging genetics dataset for the study of Alzheimer's Disease (AD). Our method identifies biologically meaningful relationships between genetic variations, MRI features, and AD status, and achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

arXiv.org Machine Learning

1304.7284

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.55)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

A New Monte Carlo Based Algorithm for the Gaussian Process Classification Problem

Atiya, Amir F., Fayed, Hatem A., Abdel-Gawad, Ahmed H.

arXiv.org Machine LearningOct-16-2013

Gaussian process is a very promising novel technology that has been applied to both the regression problem and the classification problem. While for the regression problem it yields simple exact solutions, this is not the case for the classification problem, because we encounter intractable integrals. In this paper we develop a new derivation that transforms the problem into that of evaluating the ratio of multivariate Gaussian orthant integrals. Moreover, we develop a new Monte Carlo procedure that evaluates these integrals. It is based on some aspects of bootstrap sampling and acceptancerejection. The proposed approach has beneficial properties compared to the existing Markov Chain Monte Carlo approach, such as simplicity, reliability, and speed.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1302.722

Country:

Europe > United Kingdom > England (0.28)
North America > United States > California (0.28)

Genre:

Research Report (0.50)
Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Multilabel Consensus Classification

Xie, Sihong, Kong, Xiangnan, Gao, Jing, Fan, Wei, Yu, Philip S.

arXiv.org Machine LearningOct-15-2013

In the era of big data, a large amount of noisy and incomplete data can be collected from multiple sources for prediction tasks. Combining multiple models or data sources helps to counteract the effects of low data quality and the bias of any single model or data source, and thus can improve the robustness and the performance of predictive models. Out of privacy, storage and bandwidth considerations, in certain circumstances one has to combine the predictions from multiple models or data sources to obtain the final predictions without accessing the raw data. Consensus-based prediction combination algorithms are effective for such situations. However, current research on prediction combination focuses on the single label setting, where an instance can have one and only one label. Nonetheless, data nowadays are usually multilabeled, such that more than one label have to be predicted at the same time. Direct applications of existing prediction combination methods to multilabel settings can lead to degenerated performance. In this paper, we address the challenges of combining predictions from multiple multilabel classifiers and propose two novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and MLCM-a (MLCM for microAUC). These algorithms can capture label correlations that are common in multilabel classifications, and optimize corresponding performance metrics. Experimental results on popular multilabel classification tasks verify the theoretical analysis and effectiveness of the proposed methods.

data mining, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

1310.4252

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Flow-Based Algorithms for Local Graph Clustering

Orecchia, Lorenzo, Zhu, Zeyuan Allen

arXiv.org Machine LearningOct-13-2013

Given a subset S of vertices of an undirected graph G, the cut-improvement problem asks us to find a subset S that is similar to A but has smaller conductance. A very elegant algorithm for this problem has been given by Andersen and Lang [AL08] and requires solving a small number of single-commodity maximum flow computations over the whole graph G. In this paper, we introduce LocalImprove, the first cut-improvement algorithm that is local, i.e. that runs in time dependent on the size of the input set A rather than on the size of the entire graph. Moreover, LocalImprove achieves this local behaviour while essentially matching the same theoretical guarantee as the global algorithm of Andersen and Lang. The main application of LocalImprove is to the design of better local-graph-partitioning algorithms. All previously known local algorithms for graph partitioning are random-walk based and can only guarantee an output conductance of O(\sqrt{OPT}) when the target set has conductance OPT \in [0,1]. Very recently, Zhu, Lattanzi and Mirrokni [ZLM13] improved this to O(OPT / \sqrt{CONN}) where the internal connectivity parameter CONN \in [0,1] is defined as the reciprocal of the mixing time of the random walk over the graph induced by the target set. In this work, we show how to use LocalImprove to obtain a constant approximation O(OPT) as long as CONN/OPT = Omega(1). This yields the first flow-based algorithm. Moreover, its performance strictly outperforms the ones based on random walks and surprisingly matches that of the best known global algorithm, which is SDP-based, in this parameter regime [MMV12]. Finally, our results show that spectral methods are not the only viable approach to the construction of local graph partitioning algorithm and open door to the study of algorithms with even better approximation and locality guarantees.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1137/1.9781611973402.94

1307.2855

Country: North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Add feedback

A Novel Frank-Wolfe Algorithm. Analysis and Applications to Large-Scale SVM Training

Allende, Hector, Frandi, Emanuele, Nanculef, Ricardo, Sartori, Claudio

arXiv.org Artificial IntelligenceOct-13-2013

Recently, there has been a renewed interest in the machine learning community for variants of a sparse greedy approximation procedure for concave optimization known as {the Frank-Wolfe (FW) method}. In particular, this procedure has been successfully applied to train large-scale instances of non-linear Support Vector Machines (SVMs). Specializing FW to SVM training has allowed to obtain efficient algorithms but also important theoretical results, including convergence analysis of training algorithms and new characterizations of model sparsity. In this paper, we present and analyze a novel variant of the FW method based on a new way to perform away steps, a classic strategy used to accelerate the convergence of the basic FW procedure. Our formulation and analysis is focused on a general concave maximization problem on the simplex. However, the specialization of our algorithm to quadratic forms is strongly related to some classic methods in computational geometry, namely the Gilbert and MDM algorithms. On the theoretical side, we demonstrate that the method matches the guarantees in terms of convergence rate and number of iterations obtained by using classic away steps. In particular, the method enjoys a linear rate of convergence, a result that has been recently proved for MDM on quadratic forms. On the practical side, we provide experiments on several classification datasets, and evaluate the results using statistical tests. Experiments show that our method is faster than the FW method with classic away steps, and works well even in the cases in which classic away steps slow down the algorithm. Furthermore, these improvements are obtained without sacrificing the predictive accuracy of the obtained SVM model.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1304.1014

Country:

Europe (0.92)
North America > United States (0.68)

Genre: Research Report > New Finding (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Two discussions of the paper "Bayesian measures of model complexity and fit" by D. Spiegelhalter et al., Read before The Royal Statistical Society at a meeting organized by the Research Section on Wednesday, March 13th, 2002

Moreno, E., Vazquez-Polo, F. -J., Robert, C. P.

arXiv.org Machine LearningOct-11-2013

These are the written discussions of the paper "Bayesian measures of model complexity and fit" by D. Spiegelhalter et al. (2002), following the discussions given at the Annual Meeting of the Royal Statistical Society in Newcastle-upon-Tyne on September 3rd, 2013.

artificial intelligence, machine learning, procedure, (14 more...)

arXiv.org Machine Learning

1310.2905

Country:

Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.24)
Europe > Spain > Canary Islands > Gran Canaria (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)

Add feedback

Comunication-Efficient Algorithms for Statistical Optimization

Zhang, Yuchen, Duchi, John C., Wainwright, Martin

arXiv.org Machine LearningOct-11-2013

We analyze two communication-efficient algorithms for distributed statistical optimization on large-scale data sets. The first algorithm is a standard averaging method that distributes the $N$ data samples evenly to $\nummac$ machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as $\order(N^{-1}+(N/m)^{-2})$. Whenever $m \le \sqrt{N}$, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all $\totalnumobs$ samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as $\order(N^{-1} + (N/m)^{-3})$, and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as $O(N^{-1} + (N/ m)^{-3/2})$, easing computation at the expense of penalties in the rate of convergence. We also provide experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with $N \approx 2.4 \times 10^8$ samples and $d \approx 740,000$ covariates.

artificial intelligence, inequality, machine learning, (17 more...)

arXiv.org Machine Learning

1209.4129

Country: North America > United States > California (0.27)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Flexible High-dimensional Classification Machines and Their Asymptotic Properties

Qiao, Xingye, Zhang, Lingsong

arXiv.org Machine LearningOct-10-2013

Classification is an important topic in statistics and machine learning with great potential in many real applications. In this paper, we investigate two popular large margin classification methods, Support Vector Machine (SVM) and Distance Weighted Discrimination (DWD), under two contexts: the high-dimensional, low-sample size data and the imbalanced data. A unified family of classification machines, the FLexible Assortment MachinE (FLAME) is proposed, within which DWD and SVM are special cases. The FLAME family helps to identify the similarities and differences between SVM and DWD. It is well known that many classifiers overfit the data in the high-dimensional setting; and others are sensitive to the imbalanced data, that is, the class with a larger sample size overly influences the classifier and pushes the decision boundary towards the minority class. SVM is resistant to the imbalanced data issue, but it overfits high-dimensional data sets by showing the undesired data-piling phenomena. The DWD method was proposed to improve SVM in the high-dimensional setting, but its decision boundary is sensitive to the imbalanced ratio of sample sizes. Our FLAME family helps to understand an intrinsic connection between SVM and DWD, and improves both methods by providing a better trade-off between sensitivity to the imbalanced data and overfitting the high-dimensional data. Several asymptotic properties of the FLAME classifiers are studied. Simulations and real data applications are investigated to illustrate the usefulness of the FLAME classifiers.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1310.3004

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Add feedback

Student Modeling: Supporting Personalized Instruction, from Problem Solving to Exploratory Open Ended Activities

Conati, Cristina (University of British Columbia) | Kardan, Samad (University of British Columbia)

AI MagazineOct-10-2013

Learner assessment is nontrivial even in its most basic incarnation, namely evaluating a learner's understanding of a set of domain-dependent skills from ad hoc test items (for example, Desmarais [2011]). The assessment challenges increase with the complexity of the learner's traits to be captured, because how a student behaves during an instructional activity generally provides partial and ambiguous information on the student's underlying states, and the gap between what can be observed and what a learner actually thinks and feels increases as these states go from cognitive to metacognitive and affective. In ITSs, the research field concerned with addressing these challenges is known as student modeling, and a student model is the ITS component in charge of assessing student traits and states relevant to tailor the tutorial interaction to specific student needs. Student modeling research has made the problem solution from the tutor et al. [2010]), given extensive evidence substantial progress in providing reliable (for instance by repeatedly asking for in education research showing that learner assessment during problem help) without trying to solve the problem affective factors play an important role solving or question-answering on their own (Baker et al. 2008), in learning. Educational technology At the cognitive level, knowledge can foster understanding at different however, continues to produce novel assessment, that is, evaluating the student's stages of the learning process or for environments often consisting of knowledge of relevant concepts learners with different preferences and activities not as structured and well and skills at specific points of the interaction abilities.

artificial intelligence, data mining, machine learning, (17 more...)

AI Magazine

Country:

North America > Canada (0.46)
Europe > Netherlands (0.28)

Genre:

Instructional Material > Course Syllabus & Notes (0.47)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback