AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Context-Independent Claim Detection for Argument Mining

Lippi, Marco (University of Bologna) | Torroni, Paolo (University of Bologna)

AAAI ConferencesJul-15-2015

Argumentation mining aims to automatically identify structured argument data from unstructured natural language text. This challenging, multi-faceted task is recently gaining a growing attention, especially due to its many potential applications. One particularly important aspect of argumentation mining is claim identification. Most of the current approaches are engineered to address specific domains. However, argumentative sentences are often characterized by common rhetorical structures, independently of the domain. We thus propose a method that exploits structured parsing information to detect claims without resorting to contextual information, and yet achieve a performance comparable to that of state-of-the-art methods that heavily rely on the context.

argument, classifier, information, (14 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Oceania > Palau (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Texas > Tarrant County > Fort Worth (0.04)
(7 more...)

Genre: Research Report (0.48)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Characterization of Scoring Rules with Distances: Application to the Clustering of Rankings

Viappiani, Paolo (Sorbonne Universities and CNRS)

AAAI ConferencesJul-15-2015

Positional scoring rules are often used for rank aggregation. In this work we study how scoring rules can be formulated as the minimization of some distance measures between rankings, and we also consider a new family of aggregation methods, called biased scoring rules. This work extends a previous known observation connecting Borda count with the minimization of the sum of the Spearman distances (calculated with respect to a set of input rankings). In particular we consider generalizations of the Spearman distance that can give different weights to items and positions; we also handle the case of incomplete rank data. This has applications in the clustering of rank data, where two main steps need to be performed: aggregating rankings of the same cluster into a representative ranking (the cluster's centroid) and assigning each ranking to its closest centroid. Using the proper combination of scoring rules (for aggregation) and distances (for assignment), it is possible to perform clustering in a computationally efficient way and as well account for specific desired behaviors (give more weight to top positions, bias the centroids in favor of particular items).

aggregation, distance measure, ranking, (16 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A Generalized Kernel Approach to Structured Output Learning

Kadri, Hachem, Ghavamzadeh, Mohammad, Preux, Philippe

arXiv.org Machine LearningJul-15-2015

We study the problem of structured output learning from a regression perspective. We first provide a general formulation of the kernel dependency estimation (KDE) approach to this problem using operator-valued kernels. Our formulation overcomes the two main limitations of the original KDE approach, namely the decoupling between outputs in the image space and the inability to use a joint feature space. We then propose a covariance-based operator-valued kernel that allows us to take into account the structure of the kernel feature space. This kernel operates on the output space and only encodes the interactions between the outputs without any reference to the input space. To address this issue, we introduce a variant of our KDE method based on the conditional covariance operator that in addition to the correlation between the outputs takes into account the effects of the input variables. Finally, we evaluate the performance of our KDE approach on three structured output problems, and compare it to the state-of-the-art kernelbased structured output regression methods.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Machine Learning

1205.2171

Country:

Europe (0.46)
North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Parallel MMF: a Multiresolution Approach to Matrix Computation

Kondor, Risi, Teneva, Nedelina, Mudrakarta, Pramod K.

arXiv.org Machine LearningJul-15-2015

Multiresolution Matrix Factorization (MMF) was recently introduced as a method for finding multiscale structure and defining wavelets on graphs/matrices. In this paper we derive pMMF, a parallel algorithm for computing the MMF factorization. Empirically, the running time of pMMF scales linearly in the dimension for sparse matrices. We argue that this makes pMMF a valuable new computational primitive in its own right, and present experiments on using pMMF for two distinct purposes: compressing matrices and preconditioning large sparse linear systems.

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Machine Learning

1507.04396

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Certifying and removing disparate impact

Feldman, Michael, Friedler, Sorelle, Moeller, John, Scheidegger, Carlos, Venkatasubramanian, Suresh

arXiv.org Machine LearningJul-15-2015

What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process. When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses. We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.

data mining, disparate impact, machine learning, (19 more...)

arXiv.org Machine Learning

1412.3756

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Law > Government & the Courts (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The Role of Principal Angles in Subspace Classification

Huang, Jiaji, Qiu, Qiang, Calderbank, Robert

arXiv.org Machine LearningJul-15-2015

Abstract--Subspace models play an important role in a wide range of signal processing tasks, and this paper explores how the pairwise geometry of subspaces influences the probability of misclassification. When the mismatch between the signal and the model is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. The transform presented here (TRAIT) preserves some specific characteristic of each individual class, and this approach is shown to be complementary to a previously developed transform (LRT) that enlarges inter-class distance while suppressing intra-class dispersion. Theoretical results are supported by demonstration of superior classification accuracy on synthetic and measured data even in the presence of significant model mismatch. IGNALS that are nominally high dimensional often exhibit a low dimensional geometric structure. For example, fixed-pose images of human faces are recorded using more than 1000 pixels, but can be represented by a 9-dimensional harmonic subspace [1]. Motion trajectories of a rigid body might be recorded by hundreds of sensors, but must intrinsically be represented by a 4-dimensional subspace [2]. There are many more examples where a low-dimensional subspace model captures intrinsic geometric structure, ranging from user ratings in a recommendation system [3] to signals emitted by multiple sources impinging at an antenna array [4]. The subspace geometry has assisted tasks of interest to both signal processing [5], [6] and machine learning communities [7], [8]. It can be used to approximate a nonlinear manifold by fitting mixture components to local patches of the manifold [5], [9], hence providing a high fidelity representation of a wide variety of signal geometries.

artificial intelligence, machine learning, principal angle, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2015.2500889

1507.0423

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

ALEVS: Active Learning by Statistical Leverage Sampling

Orhan, Cem, Taştan, Öznur

arXiv.org Machine LearningJul-15-2015

Active learning aims to obtain a classifier of high accuracy by using fewer label requests in comparison to passive learning by selecting effective queries. Many active learning methods have been developed in the past two decades, which sample queries based on informativeness or representativeness of unlabeled data points. In this work, we explore a novel querying criterion based on statistical leverage scores. The statistical leverage scores of a row in a matrix are the squared row-norms of the matrix containing its (top) left singular vectors and is a measure of influence of the row on the matrix. Leverage scores have been used for detecting high influential points in regression diagnostics (Chatterjee & Hadi, 1986) and have been recently shown to be useful for data analysis (Drineas et al., 2008) and randomized low-rank matrix approximation algorithms (Gittens & Mahoney, 2013). We explore how sampling data instances with high statistical leverage scores perform in active learning. Our empirical comparison on several binary classification datasets indicate that querying high leverage points is an effective strategy.

artificial intelligence, leverage score, machine learning, (16 more...)

arXiv.org Machine Learning

1507.04155

Country:

Asia > Middle East > Republic of Türkiye (0.15)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

An SVM-like Approach for Expectile Regression

Farooq, Muhammad, Steinwart, Ingo

arXiv.org Machine LearningJul-14-2015

In standard nonparametric regression analysis, most of the methods developed so far are based on the least square loss function for estimating conditional expectations. In many applications, however, it is required to study conditional distributions beyond means. A nice tool for this purpose was offered by [20] in the form of quantile regression, which allows both the location and the spread of the response variable to be studied by using asymmetric least absolute deviation loss function (ALAD). We refer the reader to [19, 37, 9, 33] and references therein, for details description and different estimation methods for quantile regression.

artificial intelligence, duality gap, machine learning, (16 more...)

arXiv.org Machine Learning

1507.03887

Country:

Europe > Germany (0.71)
North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Rich Component Analysis

Ge, Rong, Zou, James

arXiv.org Machine LearningJul-14-2015

In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might have one set of molecular observations and one set of physiological observations on the same group of individuals, and we want to quantify molecular patterns that are uncorrelated with physiology. Despite being a common problem, this is highly challenging when the correlations come from complex distributions. In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution. We introduce algorithms based on cumulant extraction that provably learn each of the components without having to model the other components. We show how to integrate RCA with stochastic gradient descent into a meta-algorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. Our method makes it possible to learn latent variable models when we don't have samples from the true model but only samples after complex perturbations.

artificial intelligence, cumulant, machine learning, (18 more...)

arXiv.org Machine Learning

1507.03867

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis

Scheinberg, Katya, Tang, Xiaocheng

arXiv.org Machine LearningJul-14-2015

Recently several methods were proposed for sparse optimization which make careful use of second-order information [10, 28, 16, 3] to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

artificial intelligence, iteration, machine learning, (14 more...)

arXiv.org Machine Learning

1311.6547

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback