AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

arXiv.org Machine LearningNov-29-2014

We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent (a.k.a. least-mean-squares). In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size $\gamma$, and a bias term that decays as O(1/$\gamma$ 2 n 2); (c) when allowing non-uniform sampling, the choice of a good sampling density depends on whether the variance or bias terms dominate. In particular, when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that leads to significant improvements.

artificial intelligence, machine learning, variance term, (17 more...)

arXiv.org Machine Learning

1412.0156

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Iraq (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.36)

Add feedback

Community Detection in Sparse Random Networks

Arias-Castro, Ery, Verzelen, Nicolas

arXiv.org Machine LearningSep-25-2014

We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an Erd\"os-R\'enyi graph on $N$ vertices and with connection probability $p_0$; under the alternative, there is an unknown subgraph on $n$ vertices where the connection probability is p1 > p0. In Arias-Castro and Verzelen (2012), we focused on the asymptotically dense regime where p0 is large enough that np0>(n/N)^{o(1)}. We consider here the asymptotically sparse regime where p0 is small enough that np0<(n/N)^{c0} for some c0>0. As before, we derive information theoretic lower bounds, and also establish the performance of various tests. Compared to our previous work, the arguments for the lower bounds are based on the same technology, but are substantially more technical in the details; also, the methods we study are different: besides a variant of the scan statistic, we study other statistics such as the size of the largest connected component, the number of triangles, the eigengap of the adjacency matrix, etc. Our detection bounds are sharp, except in the Poisson regime where we were not able to fully characterize the constant arising in the bound.

hypothesis, probability, scan test, (15 more...)

arXiv.org Machine Learning

1308.2955

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Add feedback

Power Iterated Color Refinement

Kersting, Kristian (TU Dortmund University and Fraunhofer IAIS) | Mladenov, Martin (TU Dortmund University) | Garnett, Roman (University of Bonn) | Grohe, Martin (RWTH Aachen)

AAAI ConferencesJul-14-2014

Color refinement is a basic algorithmic routine for graph isomorphismtesting and has recently been used for computing graph kernels as well as for lifting belief propagation and linear programming. So far, color refinement has been treated as a combinatorial problem. Instead, we treat it as a nonlinear continuous optimization problem and prove thatit implements a conditional gradient optimizer that can be turned into graph clustering approaches using hashing and truncated power iterations. This shows that color refinement is easy to understand in terms of random walks, easy to implement (matrix-matrix/vector multiplications) and readily parallelizable. We support our theoretical results with experiments on real-world graphs with millions of edges.

artificial intelligence, machine learning, optimization problem, (19 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)

Add feedback

Chinese Zero Pronoun Resolution: An Unsupervised Approach Combining Ranking and Integer Linear Programming

Chen, Chen (University of Texas at Dallas) | Ng, Vincent (University of Texas at Dallas)

AAAI ConferencesJul-14-2014

State-of-the-art approaches to Chinese zero pronoun resolution are supervised, requiring training documents with manually resolved zero pronouns. To eliminate the reliance on annotated data, we propose an unsupervised approach to this task. Underlying our approach is the novel idea of employing a model trained on manually resolved overt pronouns to resolve zero pronouns. Experimental results on the OntoNotes 5.0 corpus are encouraging: our unsupervised model surpasses its supervised counterparts in performance.

artificial intelligence, optimization problem, unsupervised approach combining ranking, (4 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Genre: Research Report > Promising Solution (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)

Add feedback

Linear Programs for Measuring Inconsistency in Probabilistic Logics

Potyka, Nico (Fern Universität Hagen)

AAAI ConferencesJul-1-2014

Inconsistency measures help analyzing contradictory knowledge bases and resolving inconsistencies. In recent years several measures with desirable properties have been proposed, but often these measures correspond to combinatorial or non-convex optimization problems that are hard to solve in practice. In this paper, I study a new family of inconsistency measures for probabilistic knowledge bases. All members satisfy many desirable properties and can be computed by means of convex optimization techniques. For two members, I present linear programs whose computation is barely harder than a probabilistic satisfiability test.

linear program, measuring inconsistency, probabilistic logic

AAAI Conferences

Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

Generalized Canonical Correlation Analysis for Classification

Shen, Cencheng, Sun, Ming, Tang, Minh, Priebe, Carey E.

arXiv.org Machine LearningJun-26-2014

It is common to find collections/measurements of related objects, such as the same article in different languages, similar talks given by different presenters, similar weather patterns in different years, etc. It remains to determine how much the available big data helps us in statistical analysis; simply throwing every collected dataset into the mix may not yield an optimal output. Thus it is natural and important to understand theoretically when and how additional datasets improve the performance of various statistical analysis tasks such as regression, clustering, classification, etc. This is our motivation to explore the following classification problem.

auxiliary feature, gcca projection, projection, (15 more...)

arXiv.org Machine Learning

doi: 10.1016/j.jmva.2014.05.011

1304.7981

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Government > Military (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

A variational approach to stable principal component pursuit

Aravkin, Aleksandr, Becker, Stephen, Cevher, Volkan, Olsen, Peder

arXiv.org Machine LearningJun-4-2014

Stephen Becker T. J. Watson Center IBM Research Yorktown Heights, NY We introduce a new convex formulation for stable principal component pursuit (SPCP) to decompose noisy signals into low-rank and sparse representations. For numerical solutions of our SPCP formulation, we first develop a convex variational framework and then accelerate it with quasi-Newton methods. We show, via synthetic and real data experiments, that our approach offers advantages over the classical SPCP formulations in scalability and practical parameter selection.

artificial intelligence, machine learning, spcp sum, (18 more...)

arXiv.org Machine Learning

1406.1089

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Stochastic Gradient Hamiltonian Monte Carlo

Chen, Tianqi, Fox, Emily B., Guestrin, Carlos

arXiv.org Machine LearningMay-12-2014

Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1402.4102

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Graph Kernels via Functional Embedding

Shrivastava, Anshumali, Li, Ping

arXiv.org Machine LearningApr-21-2014

We propose a representation of graph as a functional object derived from the power iteration of the underlying adjacency matrix. The proposed functional representation is a graph invariant, i.e., the functional remains unchanged under any reordering of the vertices. This property eliminates the difficulty of handling exponentially many isomorphic forms. Bhattacharyya kernel constructed between these functionals significantly outperforms the state-of-the-art graph kernels on 3 out of the 4 standard benchmark graph classification datasets, demonstrating the superiority of our approach. The proposed methodology is simple and runs in time linear in the number of edges, which makes our kernel more efficient and scalable compared to many widely adopted graph kernels with running time cubic in the number of vertices.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Machine Learning

1404.5214

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)

Add feedback

A New Space for Comparing Graphs

Shrivastava, Anshumali, Li, Ping

arXiv.org Machine LearningApr-17-2014

Finding a new mathematical representations for graph, which allows direct comparison between different graph structures, is an open-ended research direction. Having such a representation is the first prerequisite for a variety of machine learning algorithms like classification, clustering, etc., over graph datasets. In this paper, we propose a symmetric positive semidefinite matrix with the $(i,j)$-{th} entry equal to the covariance between normalized vectors $A^ie$ and $A^je$ ($e$ being vector of all ones) as a representation for graph with adjacency matrix $A$. We show that the proposed matrix representation encodes the spectrum of the underlying adjacency matrix and it also contains information about the counts of small sub-structures present in the graph such as triangles and small paths. In addition, we show that this matrix is a \emph{"graph invariant"}. All these properties make the proposed matrix a suitable object for representing graphs. The representation, being a covariance matrix in a fixed dimensional metric space, gives a mathematical embedding for graphs. This naturally leads to a measure of similarity on graph objects. We define similarity between two given graphs as a Bhattacharya similarity measure between their corresponding covariance matrix representations. As shown in our experimental study on the task of social network classification, such a similarity measure outperforms other widely used state-of-the-art methodologies. Our proposed method is also computationally efficient. The computation of both the matrix representation and the similarity value can be performed in operations linear in the number of edges. This makes our method scalable in practice. We believe our theoretical and empirical results provide evidence for studying truncated power iterations, of the adjacency matrix, to characterize social networks.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Machine Learning

1404.4644

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (0.56)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback