AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Edge-exchangeable graphs and sparsity

arXiv.org Machine LearningMar-22-2016

A known failing of many popular random graph models is that the Aldous-Hoover Theorem guarantees these graphs are dense with probability one; that is, the number of edges grows quadratically with the number of nodes. This behavior is considered unrealistic in observed graphs. We define a notion of edge exchangeability for random graphs in contrast to the established notion of infinite exchangeability for random graphs --- which has traditionally relied on exchangeability of nodes (rather than edges) in a graph. We show that, unlike node exchangeability, edge exchangeability encompasses models that are known to provide a projective sequence of random graphs that circumvent the Aldous-Hoover Theorem and exhibit sparsity, i.e., sub-quadratic growth of the number of edges with the number of nodes. We show how edge-exchangeability of graphs relates naturally to existing notions of exchangeability from clustering (a.k.a. partitions) and other familiar combinatorial structures.

artificial intelligence, exchangeability, graph, (16 more...)

arXiv.org Machine Learning

1603.06898

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.97)

Add feedback

Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

Osting, Braxton, Xiong, Jiechao, Xu, Qianqian, Yao, Yuan

arXiv.org Machine LearningMar-21-2016

Crowdsourcing enables researchers to conduct social experiments on a heterogenous set of participants and at a lower economic cost than conventional laboratory studies. For example, researchers can harness internet users to conduct user studies on their personal computers. Among various approaches to conduct subjective tests, pairwise comparisons are expected to yield more reliable results. However, in crowdsourced studies, the individuals performing the ratings are diverse compared to more controlled settings, which is difficult to control for using traditional experimental designs; researchers have recently proposed several randomized methods to conduct user studies [1, 2, 3], which accommodate incomplete and imbalanced data. HodgeRank, as an application of combinatorial Hodge theory to the preference or rank aggregation problem from pairwise comparison data, possibly being incomplete and imbalanced, was first introduced by [4], and inspired a series of studies in statistical ranking [5, 6, 7, 8]. Hodge theory has also found applications in game theory [9] and computer vision [10, 11], in addition to traditional applications in fluid mechanics [12] etc. HodgeRank formulates the ranking problem in terms of the discrete Hodge decomposition of the pairwise data and shows that it can be decomposed into three orthogonal components: a gradient flow representing a global rating (optimal in the L

artificial intelligence, machine learning, social media, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.acha.2016.03.007

1503.00164

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Conditional Risk Minimization for Stochastic Processes

Zimin, Alexander, Lampert, Christoph H.

arXiv.org Machine LearningMar-13-2016

We study the task of learning from non-i.i.d. data. In particular, we aim at learning predictors that minimize the conditional risk for a stochastic process, i.e. the expected loss of the predictor on the next point conditioned on the set of training samples observed so far. For non-i.i.d. data, the training set contains information about the upcoming samples, so learning with respect to the conditional distribution can be expected to yield better predictors than one obtains from the classical setting of minimizing the marginal risk. Our main contribution is a practical estimator for the conditional risk based on the theory of non-parametric time-series prediction, and a finite sample concentration bound that establishes uniform convergence of the estimator to the true conditional risk under certain regularity assumptions on the process.

artificial intelligence, conditional risk, machine learning, (19 more...)

arXiv.org Machine Learning

1510.02706

Country:

Europe > Austria (0.04)
North America > United States > New York (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems

Murata, Tomoya, Suzuki, Taiji

arXiv.org Machine LearningMar-8-2016

We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning. We propose two new stochastic gradient methods that are based on stochastic dual averaging method with variance reduction. Our methods generate a sparser solution than the existing methods because we do not need to take the average of the history of the solutions. This is favorable in terms of both interpretability and generalization. Moreover, our methods have theoretical support for both a strongly and a non-strongly convex regularizer and achieve the best known convergence rates among existing nonaccelerated stochastic gradient methods.

algorithm, convergence rate, convex regularizer, (8 more...)

arXiv.org Machine Learning

1603.02412

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.57)

Add feedback

Stochastic Process Bandits: Upper Confidence Bounds Algorithms via Generic Chaining

Contal, Emile, Vayatis, Nicolas

arXiv.org Machine LearningFeb-16-2016

The paper considers the problem of global optimization in the setup of stochastic process bandits. We introduce an UCB algorithm which builds a cascade of discretization trees based on generic chaining in order to render possible his operability over a continuous domain. The theoretical framework applies to functions under weak probabilistic smoothness assumptions and also extends significantly the spectrum of application of UCB strategies. Moreover generic regret bounds are derived which are then specialized to Gaussian processes indexed on infinite-dimensional spaces as well as to quadratic forms of Gaussian processes. Lower bounds are also proved in the case of Gaussian processes to assess the optimality of the proposed algorithm.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

1602.04976

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

A Variational Analysis of Stochastic Gradient Algorithms

Mandt, Stephan, Hoffman, Matthew D., Blei, David M.

arXiv.org Machine LearningFeb-8-2016

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms; we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models.

artificial intelligence, machine learning, posterior, (19 more...)

arXiv.org Machine Learning

1602.02666

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Achieving Exact Cluster Recovery Threshold via Semidefinite Programming

Hajek, Bruce, Wu, Yihong, Xu, Jiaming

arXiv.org Machine LearningJan-5-2016

The community detection problem refers to finding the underlying communities within a network using only the knowledge of the network topology [16, 31]. This paper considers the following probabilistic model for generating a network with underlying community structures: Suppose that out of a total of n vertices, rK of them are partitioned into r clusters of size K, and the remaining n rK vertices do not belong to any clusters (called outlier vertices); a random graph G is generated based on the cluster structure, where each pair of vertices is connected independently with probability p if they are in the same cluster or q otherwise. In particular, an outlier vertex is connected to any other vertex with probability q. This random graph ensemble is known as the planted cluster model [10] with parameters n, r, K N and p, q [0, 1] such that n rK. In particular, we call p and q the in-cluster and cross-cluster edge density, respectively. The planted cluster model encompasses several classical planted random graph models including planted clique [5], planted coloring [4], planted dense subgraph [6], planted partition [11], and the stochastic block model [23], which have been widely used for studying the community detection and graph partitioning problem (see, e.g., [26, 12, 30, 9] and the references therein). This paper was accepted to IEEE Transactions on Information Theory on January 3, 2016. The material in this paper was presented in part at the IEEE International Symposium on Information Theory, Hong Kong, June, 2015 [20].

artificial intelligence, data mining, probability, (16 more...)

arXiv.org Machine Learning

1412.6156

Country:

Asia > China > Hong Kong (0.24)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.76)

Add feedback

COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution

Farajtabar, Mehrdad, Wang, Yichen, Rodriguez, Manuel Gomez, Li, Shuang, Zha, Hongyuan, Song, Le

Neural Information Processing SystemsDec-31-2015

Information diffusion in online social networks is affected by the underlying network topology, but it also has the power to change it. Online users are constantly creating new links when exposed to new information sources, and in turn these links are alternating the way information spreads. However, these two highly intertwined stochastic processes, information diffusion and network evolution, have been predominantly studied separately, ignoring their co-evolutionary dynamics.We propose a temporal point process model, COEVOLVE, for such joint dynamics, allowing the intensity of one process to be modulated by that of the other. This model allows us to efficiently simulate interleaved diffusion and network events, and generate traces obeying common diffusion and network patterns observed in real-world networks. Furthermore, we also develop a convex optimization framework to learn the parameters of the model from historical diffusion and network evolution traces. We experimented with both synthetic data and data gathered from Twitter, and show that our model provides a good fit to the data as well as more accurate predictions than alternatives.

information diffusion, intensity, link creation, (14 more...)

Neural Information Processing Systems

Country: South America > Paraguay > Asunción > Asunción (0.04)

Industry: Information Technology > Services (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
(2 more...)

Add feedback

Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

Gao, Jinyang, Jagadish, H. V., Ooi, Beng Chin

arXiv.org Machine LearningDec-12-2015

Recent years have witnessed amazing outcomes from "Big Models" trained by "Big Data". Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more "learning value" to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1512.0388

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Add feedback

Sharp Finite-Time Iterated-Logarithm Martingale Concentration

Balsubramani, Akshay

arXiv.org Machine LearningDec-1-2015

Martingales are indispensable in studying the temporal dynamics of stochastic processes arising in a multitude of fields [10, 14]. Particularly when such processes have complex long-range dependences, it is often of interest to concentrate martingales uniformly over time. On the theoretical side, a fundamental limit to such concentration is expressed by the law of the iterated logarithm (LIL). However, this only concerns asymptotic behavior. In many applications, it is more natural to instead consider concentration that holds uniformly over all finite times. This manuscript presents such bounds for the large classes of martingales which are addressed by Hoeffding [11] and Bernstein [8] inequalities. These new results are optimal within small constants, and can be viewed as finite-time generalizations of the upper half of the LIL.

concentration, martingale, theorem 2, (16 more...)

arXiv.org Machine Learning

1405.2639

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback