AITopics

2006.1035

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Tabaghi, Puoya, Dokmanić, Ivan

Geometry of Comparisons

arXiv.org Machine LearningJun-17-2020

Many data analysis problems can be cast as distance geometry problems in \emph{space forms}---Euclidean, elliptic, or hyperbolic spaces. We ask: what can be said about the dimension of the underlying space form if we are only given a subset of comparisons between pairwise distances, without computing an actual embedding? To study this question, we define the \textit{ordinal capacity} of a metric space. Ordinal capacity measures how well a space can accommodate a given set of ordinal measurements. We prove that the ordinal capacity of a space form is related to its dimension and curvature sign, and provide a lower bound on the embedding dimension of non-metric graphs in terms of the \textit{ordinal spread} of their sub-cliques. Computer experiments on random graphs, Bitcoin trust network, and olfactory data illustrate the theory.

artificial intelligence, machine learning, ordinal, (19 more...)

2006.09858

Country:

North America > United States > Illinois (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications (0.88)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Additive Poisson Process: Learning Intensity of Higher-Order Interaction in Stochastic Processes

Luo, Simon, Zhou, Feng, Azizi, Lamiae, Sugiyama, Mahito

We present the Additive Poisson Process (APP), a novel framework that can model the higher-order interaction effects of the intensity functions in stochastic processes using lower dimensional projections. Our model combines the techniques in information geometry to model higher-order interactions on a statistical manifold and in generalized additive models to use lower-dimensional projections to overcome the effects from the curse of dimensionality. Our approach solves a convex optimization problem by minimizing the KL divergence from a sample distribution in lower dimensional projections to the distribution modeled by an intensity function in the stochastic process. Our empirical results show that our model is able to use samples observed in the lower dimensional space to estimate the higher-order intensity function with extremely sparse observations.

artificial intelligence, intensity function, machine learning, (13 more...)

2006.08982

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

A Survey of Constrained Gaussian Process Regression: Approaches and Implementation Challenges

Swiler, Laura, Gulian, Mamikon, Frankel, Ari, Safta, Cosmin, Jakeman, John

Gaussian process regression is a popular Bayesian framework for surrogate modeling of expensive data sources. As part of a broader effort in scientific machine learning, many recent works have incorporated physical constraints or other a priori information within Gaussian process regression to supplement limited data and regularize the behavior of the model. We provide an overview and survey of several classes of Gaussian process constraints, including positivity or bound constraints, monotonicity and convexity constraints, differential equation constraints provided by linear PDEs, and boundary condition constraints. We compare the strategies behind each approach as well as the differences in implementation, concluding with a discussion of the computational challenges introduced by constraints.

artificial intelligence, constraint, machine learning, (20 more...)

2006.09319

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
South America > Chile (0.04)
North America > United States > Texas (0.04)
(11 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
(3 more...)

Takahashi, Atsumori, Nomura, Shunichi

Efficient Path Algorithms for Clustered Lasso and OSCAR

In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise $L_1$ norm and pairwise $L_\infty$ norm, respectively. This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters. Despite too many terms in exhaustive pairwise regularization, their computational costs are reduced by using symmetry of those terms. Simple equivalent conditions to check subgradient equations in each feature group are derived by some graph theories. The proposed algorithms are shown to be more efficient than existing algorithms in numerical experiments.

algorithm, artificial intelligence, machine learning, (15 more...)

2006.08965

Country:

South America > Brazil (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Tan, Yingcong, Terekhov, Daria, Delong, Andrew

Learning Linear Programs from Optimal Decisions

We propose a flexible gradient-based framework for learning linear programs from optimal decisions. Linear programs are often specified by hand, using prior knowledge of relevant costs and constraints. In some applications, linear programs must instead be learned from observations of optimal decisions. Learning from optimal decisions is a particularly challenging bi-level problem, and much of the related inverse optimization literature is dedicated to special cases. We tackle the general problem, learning all parameters jointly while allowing flexible parametrizations of costs, constraints, and loss functions. We also address challenges specific to learning linear programs, such as empty feasible regions and non-unique optimal decisions. Experiments show that our method successfully learns synthetic linear programs and minimum-cost multi-commodity flow instances for which previous methods are not directly applicable. We also provide a fast batch-mode PyTorch implementation of the homogeneous interior point algorithm, which supports gradients by implicit differentiation or backpropagation.

artificial intelligence, constraint, machine learning, (17 more...)

2006.08923

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Lu, Yucheng, Li, Zheng, De Sa, Christopher

Towards Optimal Convergence Rate in Decentralized Stochastic Training

arXiv.org Machine LearningJun-14-2020

Parallel training with decentralized communication is a promising method of scaling up machine learning systems. In this paper, we provide a tight lower bound on the iteration complexity for such methods in a stochastic non-convex setting. This lower bound reveals a theoretical gap in known convergence rates of many existing algorithms. To show this bound is tight and achievable, we propose DeFacto, a class of algorithms that converge at the optimal rate without additional theoretical assumptions. We discuss the trade-offs among different algorithms regarding complexity, memory efficiency, throughput, etc. Empirically, we compare DeFacto and other decentralized algorithms via training Resnet20 on CIFAR10 and Resnet110 on CIFAR100. We show DeFacto can accelerate training with respect to wall-clock time but progresses slowly in the first few epochs.

algorithm, artificial intelligence, machine learning, (13 more...)

2006.08085

Country:

North America > United States > California (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.84)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Li, Zhize, Richtárik, Peter

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

arXiv.org Machine LearningJun-12-2020

In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases. In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used. We show that all methods captured by these two frameworks also satisfy our unified assumption. Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication. Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.

artificial intelligence, machine learning, unified assumption 1, (17 more...)

2006.07013

Country:

North America > United States > Virginia (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.71)

Majka, Mateusz B., Sabate-Vidales, Marc, Szpruch, Łukasz

Multi-index Antithetic Stochastic Gradient Algorithm

arXiv.org Machine LearningJun-10-2020

Stochastic Gradient Algorithms (SGAs) are ubiquitous in computational statistics, machine learning and optimisation. Recent years have brought an influx of interest in SGAs and the non-asymptotic analysis of their bias is by now well-developed. However, in order to fully understand the efficiency of Monte Carlo algorithms utilizing stochastic gradients, one also needs to carry out the analysis of their variance, which turns out to be problem-specific. For this reason, there is no systematic theory that would specify the optimal choice of the random approximation of the gradient in SGAs for a given data regime. Furthermore, while there have been numerous attempts to reduce the variance of SGAs, these typically exploit a particular structure of the sampled distribution. In this paper we use the Multi-index Monte Carlo apparatus combined with the antithetic approach to construct the Multi-index Antithetic Stochastic Gradient Algorithm (MASGA), which can be used to sample from any probability distribution. This, to our knowledge, is the first SGA that, for all data regimes and without relying on any specific structure of the target measure, achieves performance on par with Monte Carlo estimators that have access to unbiased samples from the distribution of interest. In other words, MASGA is an optimal estimator from the error-computational cost perspective within the class of Monte Carlo estimators.

artificial intelligence, estimator, machine learning, (17 more...)

2006.06102

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Mignacco, Francesca, Krzakala, Florent, Urbani, Pierfrancesco, Zdeborová, Lenka

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

arXiv.org Machine LearningJun-10-2020

We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of control parameters shedding light on how it navigates the loss landscape.

artificial intelligence, gradient descent, machine learning, (17 more...)

2006.06098

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)