AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

A Theoretical Analysis of the Learning Dynamics under Class Imbalance

Francazi, Emanuele, Baity-Jesi, Marco, Lucchi, Aurelien

arXiv.org Artificial IntelligenceJun-1-2023

Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based optimizer. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient. With SGD, class imbalance has an additional effect on the direction of the gradients: the minority class suffers from a higher directional noise, which reduces the effectiveness of the per-class gradient normalization. Our findings not only allow us to understand the potential and limitations of strategies involving the per-class gradients, but also the reason for the effectiveness of previously used solutions for class imbalance such as oversampling.

gradient, imbalance, theoretical analysis, (15 more...)

arXiv.org Artificial Intelligence

2207.00391

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

Faster Robust Tensor Power Method for Arbitrary Order

Deng, Yichuan, Song, Zhao, Yin, Junze

arXiv.org Artificial IntelligenceJun-1-2023

With the development of large-scale-data-driven applications, such as neural networks, social network analysis, and multi-media processing, tensors have become a powerful paradigm to handle the data. According to [SWZ16], in recommendation systems, it's often beneficial to utilize more than two attributes to generate more accurate recommendations. For instance, in the case of Groupon, one could examine three attributes such as time, users, and activities, which may include but are not limited to the factors like time of day, season, weekday, weekend, etc., as a basis for making predictions. More information on this can be found in [KB09]. Tensor decomposition is a mathematical tool that can break down the higher order tensor into a combination of lower order tensors. To deal with the high-dimensional data, decomposition becomes a natural method to handle the tensors, where the operation reads the original tensor as inputs and outputs the decomposition of it in some succinct form.

artificial intelligence, machine learning, step follow, (17 more...)

arXiv.org Artificial Intelligence

2306.00406

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A polynomial-time iterative algorithm for random graph matching with non-vanishing correlation

Ding, Jian, Li, Zhangsong

arXiv.org Machine LearningMay-31-2023

We propose an efficient algorithm for matching two correlated Erd\H{o}s--R\'enyi graphs with $n$ vertices whose edges are correlated through a latent vertex correspondence. When the edge density $q= n^{- \alpha+o(1)}$ for a constant $\alpha \in [0,1)$, we show that our algorithm has polynomial running time and succeeds to recover the latent matching as long as the edge correlation is non-vanishing. This is closely related to our previous work on a polynomial-time algorithm that matches two Gaussian Wigner matrices with non-vanishing correlation, and provides the first polynomial-time random graph matching algorithm (regardless of the regime of $q$) when the edge correlation is below the square root of the Otter's constant (which is $\approx 0.338$).

artificial intelligence, lemma 3, machine learning, (18 more...)

arXiv.org Machine Learning

2306.00266

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.84)

Add feedback

Bandwidth Optimal Pipeline Schedule for Collective Communication

Zhao, Liangyu, Krishnamurthy, Arvind

arXiv.org Artificial IntelligenceMay-31-2023

We present a strongly polynomial-time algorithm to generate bandwidth optimal allgather/reduce-scatter on any network topology, with or without switches. Our algorithm constructs pipeline schedules achieving provably the best possible bandwidth performance on a given topology. To provide a universal solution, we model the network topology as a directed graph with heterogeneous link capacities and switches directly as vertices in the graph representation. The algorithm is strongly polynomial-time with respect to the topology size. This work heavily relies on previous graph theory work on edge-disjoint spanning trees and edge splitting. While we focus on allgather, the methods in this paper can be easily extended to generate schedules for reduce, broadcast, reduce-scatter, and allreduce.

artificial intelligence, bandwidth runtime, topology, (16 more...)

arXiv.org Artificial Intelligence

2305.18461

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Introduction to Online Nonstochastic Control

Hazan, Elad, Singh, Karan

arXiv.org Artificial IntelligenceMay-29-2023

This text presents an introduction to an emerging paradigm in control of dynamical systems and differentiable reinforcement learning called online nonstochastic control. The new approach applies techniques from online convex optimization and convex relaxations to obtain new methods with provable guarantees for classical settings in optimal and robust control. The primary distinction between online nonstochastic control and other frameworks is the objective. In optimal control, robust control, and other control methodologies that assume stochastic noise, the goal is to perform comparably to an offline optimal strategy. In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary. Thus the optimal policy is not defined a priori. Rather, the target is to attain low regret against the best policy in hindsight from a benchmark class of policies. This objective suggests the use of the decision making framework of online convex optimization as an algorithmic methodology. The resulting methods are based on iterative mathematical optimization algorithms, and are accompanied by finite-time regret and computational complexity guarantees.

artificial intelligence, comment suggestion, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.09619

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Instructional Material (0.92)
Summary/Review (0.92)
Workflow (0.67)
Overview (0.65)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Transportation (0.67)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.92)
Information Technology > Artificial Intelligence > Robots (0.92)

Add feedback

Federated Empirical Risk Minimization via Second-Order Method

Bian, Song, Song, Zhao, Yin, Junze

arXiv.org Artificial IntelligenceMay-27-2023

Many convex optimization problems with important applications in machine learning are formulated as empirical risk minimization (ERM). There are several examples: linear and logistic regression, LASSO, kernel regression, quantile regression, $p$-norm regression, support vector machines (SVM), and mean-field variational inference. To improve data privacy, federated learning is proposed in machine learning as a framework for training deep learning models on the network edge without sharing data between participating nodes. In this work, we present an interior point method (IPM) to solve a general ERM problem under the federated learning setting. We show that the communication complexity of each iteration of our IPM is $\tilde{O}(d^{3/2})$, where $d$ is the dimension (i.e., number of features) of the dataset.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.17482

Country:

North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Sharpened Lazy Incremental Quasi-Newton Method

Lahoti, Aakash, Senapati, Spandan, Rajawat, Ketan, Koppel, Alec

arXiv.org Artificial IntelligenceMay-26-2023

We consider the finite sum minimization of $n$ strongly convex and smooth functions with Lipschitz continuous Hessians in $d$ dimensions. In many applications where such problems arise, including maximum likelihood estimation, empirical risk minimization, and unsupervised learning, the number of observations $n$ is large, and it becomes necessary to use incremental or stochastic algorithms whose per-iteration complexity is independent of $n$. Of these, the incremental/stochastic variants of the Newton method exhibit superlinear convergence, but incur a per-iteration complexity of $O(d^3)$, which may be prohibitive in large-scale settings. On the other hand, the incremental Quasi-Newton method incurs a per-iteration complexity of $O(d^2)$ but its superlinear convergence rate has only been characterized asymptotically. This work puts forth the Sharpened Lazy Incremental Quasi-Newton (SLIQN) method that achieves the best of both worlds: an explicit superlinear convergence rate with a per-iteration complexity of $O(d^2)$. Building upon the recently proposed Sharpened Quasi-Newton method, the proposed incremental variant incorporates a hybrid update strategy incorporating both classic and greedy BFGS updates. The proposed lazy update rule distributes the computational complexity between the iterations, so as to enable a per-iteration complexity of $O(d^2)$. Numerical tests demonstrate the superiority of SLIQN over all other incremental and stochastic Quasi-Newton variants.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.17283

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

On Representing Mixed-Integer Linear Programs by Graph Neural Networks

Chen, Ziang, Liu, Jialin, Wang, Xinshang, Lu, Jianfeng, Yin, Wotao

arXiv.org Artificial IntelligenceMay-25-2023

While Mixed-integer linear programming (MILP) is NP-hard in general, practical MILP has received roughly 100--fold speedup in the past twenty years. Still, many classes of MILPs quickly become unsolvable as their sizes increase, motivating researchers to seek new acceleration techniques for MILPs. With deep learning, they have obtained strong empirical results, and many results were obtained by applying graph neural networks (GNNs) to making decisions in various stages of MILP solution processes. This work discovers a fundamental limitation: there exist feasible and infeasible MILPs that all GNNs will, however, treat equally, indicating GNN's lacking power to express general MILPs. Then, we show that, by restricting the MILPs to unfoldable ones or by adding random features, there exist GNNs that can reliably predict MILP feasibility, optimal objective values, and optimal solutions up to prescribed precision. We conducted small-scale numerical experiments to validate our theoretical findings.

artificial intelligence, machine learning, milp, (16 more...)

arXiv.org Artificial Intelligence

2210.10759

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

On Representing Linear Programs by Graph Neural Networks

Chen, Ziang, Liu, Jialin, Wang, Xinshang, Lu, Jianfeng, Yin, Wotao

arXiv.org Artificial IntelligenceMay-25-2023

Learning to optimize is a rapidly growing area that aims to solve optimization problems or improve existing optimization algorithms using machine learning (ML). In particular, the graph neural network (GNN) is considered a suitable ML model for optimization problems whose variables and constraints are permutation--invariant, for example, the linear program (LP). While the literature has reported encouraging numerical results, this paper establishes the theoretical foundation of applying GNNs to solving LPs. Given any size limit of LPs, we construct a GNN that maps different LPs to different outputs. We show that properly built GNNs can reliably predict feasibility, boundedness, and an optimal solution for each LP in a broad class. Our proofs are based upon the recently--discovered connections between the Weisfeiler--Lehman isomorphism test and the GNN. To validate our results, we train a simple GNN and present its accuracy in mapping LPs to their feasibilities and solutions.

artificial intelligence, gnn, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2209.12288

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre:

Research Report (0.70)
Overview (0.46)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Data-driven Science and Machine Learning Methods in Laser-Plasma Physics

Döpp, Andreas, Eberle, Christoph, Howard, Sunny, Irshad, Faran, Lin, Jinpu, Streeter, Matthew

arXiv.org Artificial IntelligenceMay-24-2023

Laser-plasma physics has developed rapidly over the past few decades as high-power lasers have become both increasingly powerful and more widely available. Early experimental and numerical research in this field was restricted to single-shot experiments with limited parameter exploration. However, recent technological improvements make it possible to gather an increasing amount of data, both in experiments and simulations. This has sparked interest in using advanced techniques from mathematics, statistics and computer science to deal with, and benefit from, big data. At the same time, sophisticated modeling techniques also provide new ways for researchers to effectively deal with situations in which still only sparse amounts of data are available. This paper aims to present an overview of relevant machine learning methods with focus on applicability to laser-plasma physics, including its important sub-fields of laser-plasma acceleration and inertial confinement fusion.

artificial intelligence, evolutionary algorithm, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2212.00026

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Energy > Oil & Gas > Upstream (0.92)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(6 more...)

Add feedback