AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution

Journal of Artificial Intelligence ResearchFeb-25-2011

Traditional learning-based coreference resolvers operate by training the mention-pair model for determining whether two mentions are coreferent or not. Though conceptually simple and easy to understand, the mention-pair model is linguistically rather unappealing and lags far behind the heuristic-based coreference models proposed in the pre-statistical NLP era in terms of sophistication. Two independent lines of recent research have attempted to improve the mention-pair model, one by acquiring the mention-ranking model to rank preceding mentions for a given anaphor, and the other by training the entity-mention model to determine whether a preceding cluster is coreferent with a given mention. We propose a cluster-ranking approach to coreference resolution, which combines the strengths of the mention-ranking model and the entity-mention model, and is therefore theoretically more appealing than both of these models. In addition, we seek to improve cluster rankers via two extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity by jointly modeling anaphoricity determination and coreference resolution. Experimental results on the ACE data sets demonstrate the superior performance of cluster rankers to competing approaches as well as the effectiveness of our two extensions.

coreference model, coreference resolution, mention-pair model, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3120

AI Access Foundation

10694

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Education (0.47)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(4 more...)

Add feedback

Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

Eriksson, Brian, Dasarathy, Gautam, Singh, Aarti, Nowak, Robert

arXiv.org Machine LearningFeb-18-2011

Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)/2 similarities. First, we show that if the intracluster similarities exceed intercluster similarities, then it is possible to correctly determine the hierarchical clustering from as few as 3N log N similarities. We demonstrate this order of magnitude savings in the number of pairwise similarities necessitates sequentially selecting which similarities to obtain in an adaptive fashion, rather than picking them at random. We then propose an active clustering method that is robust to a limited fraction of anomalous similarities, and show how even in the presence of these noisy similarity values we can resolve the hierarchical clustering using only O(N log^2 N) pairwise similarities.

artificial intelligence, machine learning, similarity, (18 more...)

arXiv.org Machine Learning

1102.3887

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Differentially Private Empirical Risk Minimization

Chaudhuri, Kamalika, Monteleoni, Claire, Sarwate, Anand D.

arXiv.org Artificial IntelligenceFeb-16-2011

Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the $\epsilon$-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

0912.0071

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Add feedback

Predictors of short-term decay of cell phone contacts in a large scale communication network

Raeder, Troy, Lizardo, Omar, Hachen, David, Chawla, Nitesh V.

arXiv.org Machine LearningFeb-11-2011

Under what conditions is an edge present in a social network at time t likely to decay or persist by some future time t + Delta(t)? Previous research addressing this issue suggests that the network range of the people involved in the edge, the extent to which the edge is embedded in a surrounding structure, and the age of the edge all play a role in edge decay. This paper uses weighted data from a large-scale social network built from cell-phone calls in an 8-week period to determine the importance of edge weight for the decay/persistence process. In particular, we study the relative predictive power of directed weight, embeddedness, newness, and range (measured as outdegree) with respect to edge decay and assess the effectiveness with which a simple decision tree and logistic regression classifier can accurately predict whether an edge that was active in one time period continues to be so in a future time period. We find that directed edge weight, weighted reciprocity and time-dependent measures of edge longevity are highly predictive of whether we classify an edge as persistent or decayed, relative to the other types of factors at the dyad and neighborhood level.

artificial intelligence, decay, machine learning, (19 more...)

arXiv.org Machine Learning

1102.1753

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Telecommunications (0.94)
Government (0.93)
Information Technology (0.91)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

From Machine Learning to Machine Reasoning

Bottou, Leon

arXiv.org Artificial IntelligenceFeb-11-2011

A plausible definition of "reasoning" could be "algebraically manipulating previously acquired knowledge in order to answer a new question". This definition covers first-order logical inference or probabilistic inference. It also includes much simpler manipulations commonly used to build large learning systems. For instance, we can build an optical character recognition system by first training a character segmenter, an isolated character recognizer, and a language model, using appropriate labeled training sets. Adequately concatenating these modules and fine tuning the resulting system can be viewed as an algebraic operation in a space of models. The resulting model answers a new question, that is, converting the image of a text page into a computer readable text. This observation suggests a conceptual continuity between algebraically rich inference systems, such as logical or probabilistic inference, and simple manipulations, such as the mere concatenation of trainable learning systems. Therefore, instead of trying to bridge the gap between machine learning systems and sophisticated "all-purpose" inference mechanisms, we can instead algebraically enrich the set of manipulations applicable to training systems, and build reasoning capabilities from the ground up.

artificial intelligence, machine learning, module, (18 more...)

arXiv.org Artificial Intelligence

1102.1808

Country: North America (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning with Support Vector Machines

Campbell, Colin, Ying, Yiming

Morgan & Claypool PublishersFeb-10-2011

Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such as prediction with real-valued outputs, novelty detection and the handling of complex output structures such as parse trees.

artificial intelligence, machine learning, support vector machine, (11 more...)

Morgan & Claypool Publishers

Country: Asia > China (0.21)

Genre: Overview (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Bayesian Nonparametric Covariance Regression

Fox, Emily, Dunson, David

arXiv.org Machine LearningFeb-8-2011

Although there is a rich literature on methods for allowing the variance in a univariate regression model to vary with predictors, time and other factors, relatively little has been done in the multivariate case. Our focus is on developing a class of nonparametric covariance regression models, which allow an unknown p x p covariance matrix to change flexibly with predictors. The proposed modeling framework induces a prior on a collection of covariance matrices indexed by predictors through priors for predictor-dependent loadings matrices in a factor model. In particular, the predictor-dependent loadings are characterized as a sparse combination of a collection of unknown dictionary functions (e.g, Gaussian process random functions). The induced covariance is then a regularized quadratic function of these dictionary elements. Our proposed framework leads to a highly-flexible, but computationally tractable formulation with simple conjugate posterior updates that can readily handle missing data. Theoretical properties are discussed and the methods are illustrated through simulations studies and an application to the Google Flu Trends data.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1101.2017

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

A convex model for non-negative matrix factorization and dimensionality reduction on physical space

Esser, Ernie, Möller, Michael, Osher, Stanley, Sapiro, Guillermo, Xin, Jack

arXiv.org Machine LearningFeb-4-2011

A collaborative convex framework for factoring a data matrix $X$ into a non-negative product $AS$, with a sparse coefficient matrix $S$, is proposed. We restrict the columns of the dictionary matrix $A$ to coincide with certain columns of the data matrix $X$, thereby guaranteeing a physically meaningful dictionary and dimensionality reduction. We use $l_{1,\infty}$ regularization to select the dictionary from the data and show this leads to an exact convex relaxation of $l_0$ in the case of distinct noise free data. We also show how to relax the restriction-to-$X$ constraint by initializing an alternating minimization approach with the solution of the convex model, obtaining a dictionary close to but not necessarily in $X$. We focus on applications of the proposed framework to hyperspectral endmember and abundances identification and also show an application to blind source separation of NMR data.

artificial intelligence, endmember, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TIP.2012.2190081

1102.0844

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)

Add feedback

Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties

Yan, Feng, Sundaram, Shreyas, Vishwanathan, S. V. N., Qi, Yuan

arXiv.org Artificial IntelligenceFeb-4-2011

Online learning has become increasingly popular on handling massive data. The sequential nature of online learning, however, requires a centralized learner to store data and update parameters. In this paper, we consider online learning with {\em distributed} data sources. The autonomous learners update local parameters based on local data sources and periodically exchange information with a small subset of neighbors in a communication network. We derive the regret bound for strongly convex functions that generalizes the work by Ram et al. (2010) for convex functions. Most importantly, we show that our algorithm has \emph{intrinsic} privacy-preserving properties, and we prove the sufficient and necessary conditions for privacy preservation in the network. These conditions imply that for networks with greater-than-one connectivity, a malicious learner cannot reconstruct the subgradients (and sensitive raw data) of other learners, which makes our algorithm appealing in privacy sensitive applications.

data mining, machine learning, node, (18 more...)

arXiv.org Artificial Intelligence

1006.4039

Genre: Research Report (0.50)

Industry:

Education > Educational Setting > Online (1.00)
Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Training linear ranking SVMs in linearithmic time using red-black trees

Airola, Antti, Pahikkala, Tapio, Salakoski, Tapio

arXiv.org Machine LearningJan-31-2011

Learning to rank has been a task of significant interest during the recent years. The ranking problem has been largely motivated by applications in areas such as web search and recommender systems. Due to the large amounts of data available in these domains, it is necessary for the used algorithms to scale well, preferably close to linear time methods are needed. For a detailed introduction to the topic of learning to rank, we refer to (Liu, 2009; Fürnkranz and Hüllermeier, 2011). In this work we assume the so-called scoring setting, where each data instance is associated with a utility score reflecting its goodness with respect to the ranking criterion. A successful approach for learning ranking functions has been to consider pairwise preferences (Fürnkranz and Hüllermeier, 2005). In this setting, the aim is to minimize the number of pairwise mis-orderings in the ranking produced when ordering a set of examples according to predicted utility scores. A number of machine learning algorithms optimizing relaxations of this criterion have been proposed, such as the RankBoost (Freund et al., 2003), RankNet (Burges et al., 2005), RankRLS (Pahikkala et al., 2007, 2009), and the subject of this study, the ranking support vector machine (RankSVM) algorithm (Herbrich et al., 1999; Joachims, 2002). The original solution proposed for RankSVM optimization was to train a support vector machine (SVM) classifier on pairs of data examples.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1016/j.patrec.2011.03.014

1005.0928

Country: North America > United States (0.70)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback