AITopics

2309.01363

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre:

Research Report > New Finding (0.48)
Overview > Innovation (0.48)
Research Report > Promising Solution (0.34)

Industry: Banking & Finance > Trading (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Machine LearningSep-4-2023

Asymmetric matrix sensing by gradient descent with small random initialization

Wind, Johan S.

We study matrix sensing, which is the problem of reconstructing a low-rank matrix from a few linear measurements. It can be formulated as an overparameterized regression problem, which can be solved by factorized gradient descent when starting from a small random initialization. Linear neural networks, and in particular matrix sensing by factorized gradient descent, serve as prototypical models of non-convex problems in modern machine learning, where complex phenomena can be disentangled and studied in detail. Much research has been devoted to studying special cases of asymmetric matrix sensing, such as asymmetric matrix factorization and symmetric positive semi-definite matrix sensing. Our key contribution is introducing a continuous differential equation that we call the $\textit{perturbed gradient flow}$. We prove that the perturbed gradient flow converges quickly to the true target matrix whenever the perturbation is sufficiently bounded. The dynamics of gradient descent for matrix sensing can be reduced to this formulation, yielding a novel proof of asymmetric matrix sensing with factorized gradient descent. Compared to directly analyzing the dynamics of gradient descent, the continuous formulation allows bounding key quantities by considering their derivatives, often simplifying the proofs. We believe the general proof technique may prove useful in other settings as well.

artificial intelligence, inequality, machine learning, (15 more...)

2309.01796

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Xiao, Nachuan, Hu, Xiaoyin, Toh, Kim-Chuan

Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

arXiv.org Machine LearningSep-4-2023

In this paper, we investigate the convergence properties of the stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions. We develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, respectively. Under mild conditions, we prove the global convergence of our proposed framework in both single-timescale and two-timescale cases. We show that our proposed framework encompasses a wide range of well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD. Furthermore, when the objective function adopts a finite-sum formulation, we prove the convergence properties for these SGD-type methods based on our proposed framework. In particular, we prove that these SGD-type methods find the Clarke stationary points of the objective function with randomly chosen stepsizes and initial points under mild assumptions. Preliminary numerical experiments demonstrate the high efficiency of our analyzed SGD-type methods.

artificial intelligence, machine learning, sequence, (19 more...)

2307.10053

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Shamaee, M. Soheil, Hafshejani, S. Fathi

Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

arXiv.org Artificial IntelligenceSep-3-2023

Stochastic gradient descent (SGD) has a rich historical background, originating from the influential work by Robbins and Monro [11]. In the realm of modern machine learning, SGD has emerged as a fundamental optimization algorithm for training deep neural networks (DNNs), which have achieved remarkable performance across diverse domains such as image classification [6, 7], object detection [10], and machine translation [14]. The selection of an appropriate step size, often referred to as the learning rate, plays a pivotal role in the convergence behavior of SGD. If the step size value is too large, it can prevent SGD iterations from reaching the optimal point, leading to instability and divergence. On the other hand, excessively small step size values can result in slow convergence and hinder the algorithm's ability to escape suboptimal local minima [9].

dataset, enhanced stochastic gradient descent, step size, (11 more...)

2309.01248

Country: Asia > Middle East > Iran > Fars Province > Shiraz (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningSep-2-2023

Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

Yu, Da, Kamath, Gautam, Kulkarni, Janardhan, Liu, Tie-Yan, Yin, Jian, Zhang, Huishuai

Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific $(\varepsilon,\delta)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to investigate individual privacy across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees. For example, on CIFAR-10, the average $\varepsilon$ of the class with the lowest test accuracy is 44.2\% higher than that of the class with the highest accuracy.

gradient norm, privacy, privacy parameter, (12 more...)

2206.02617

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceSep-1-2023

Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning

Hatem, Ahmed, Qian, Yiming, Wang, Yang

Affordable 3D scanners often produce sparse and non-uniform point clouds that negatively impact downstream applications in robotic systems. While existing point cloud upsampling architectures have demonstrated promising results on standard benchmarks, they tend to experience significant performance drops when the test data have different distributions from the training data. To address this issue, this paper proposes a test-time adaption approach to enhance model generality of point cloud upsampling. The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption. Our method does not require any prior information about the test data. During meta-training, the model parameters are learned from a collection of instance-level tasks, each of which consists of a sparse-dense pair of point clouds from the training data. During meta-testing, the trained model is fine-tuned with a few gradient updates to produce a unique set of network parameters for each test instance. The updated model is then used for the final prediction. Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling. Extensive experiments demonstrate that our approach improves the performance of state-of-the-art models.

adaptation, cloud, point cloud, (16 more...)

2308.16484

Country: North America > Canada > Manitoba (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Dominguez, Alejandro Rodriguez, Shahzad, Muhammad, Hong, Xia

Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction

arXiv.org Machine LearningSep-1-2023

Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. It can be tackled with multiple hypotheses frameworks but with the difficulty of combining them efficiently in a learning model. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. The predictors are regression models of any type that can form centroidal Voronoi tessellations which are a function of their losses during training. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution and is equivalent to interpolating the meta-loss of the predictors, the loss being a zero set of the interpolation error. This model has a fixed-point iteration algorithm between the predictors and the centers of the basis functions. Diversity in learning can be controlled parametrically by truncating the tessellation formation with the losses of individual predictors. A closed-form solution with least-squares is presented, which to the authors knowledge, is the fastest solution in the literature for multiple hypotheses and structured predictions. Superior generalization performance and computational efficiency is achieved using only two-layer neural networks as predictors controlling diversity as a key component of success. A gradient-descent approach is introduced which is loss-agnostic regarding the predictors. The expected value for the loss of the structured model with Gaussian basis functions is computed, finding that correlation between predictors is not an appropriate tool for diversification. The experiments show outperformance with respect to the top competitors in the literature.

artificial intelligence, machine learning, predictor, (17 more...)

2309.00781

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Bastianello, Nicola, Rikos, Apostolos I., Johansson, Karl H.

Online Distributed Learning with Quantized Finite-Time Coordination

arXiv.org Artificial IntelligenceAug-31-2023

In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.

learning, online, quantized finite-time coordination

2307.0662

Genre:

Research Report (0.89)
Instructional Material > Online (0.80)

Industry: Education (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)

Leonte, Dan, Veraart, Almut E. D.

Likelihood-based inference and forecasting for trawl processes: a stochastic optimization approach

arXiv.org Machine LearningAug-30-2023

We consider trawl processes, which are stationary and infinitely divisible stochastic processes and can describe a wide range of statistical properties, such as heavy tails and long memory. In this paper, we develop the first likelihood-based methodology for the inference of real-valued trawl processes and introduce novel deterministic and probabilistic forecasting methods. Being non-Markovian, with a highly intractable likelihood function, trawl processes require the use of composite likelihood functions to parsimoniously capture their statistical properties. We formulate the composite likelihood estimation as a stochastic optimization problem for which it is feasible to implement iterative gradient descent methods. We derive novel gradient estimators with variances that are reduced by several orders of magnitude. We analyze both the theoretical properties and practical implementation details of these estimators and release a Python library which can be used to fit a large class of trawl processes. In a simulation study, we demonstrate that our estimators outperform the generalized method of moments estimators in terms of both parameter estimation error and out-of-sample forecasting error. Finally, we formalize a stochastic chain rule for our gradient estimators. We apply the new theory to trawl processes and provide a unified likelihood-based methodology for the inference of both real-valued and integer-valued trawl processes.

artificial intelligence, estimator, machine learning, (18 more...)

2308.16092

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Nguyen, Mike, Mücke, Nicole

Random feature approximation for general spectral methods

arXiv.org Machine LearningAug-29-2023

Random feature approximation is arguably one of the most popular techniques to speed up kernel methods in large scale algorithms and provides a theoretical approach to the analysis of deep neural networks. We analyze generalization properties for a large class of spectral regularization methods combined with random features, containing kernel methods with implicit regularization such as gradient descent or explicit methods like Tikhonov regularization. For our estimators we obtain optimal learning rates over regularity classes (even for classes that are not included in the reproducing kernel Hilbert space), which are defined through appropriate source conditions. This improves or completes previous results obtained in related settings for specific kernel algorithms.

artificial intelligence, machine learning, proposition, (16 more...)

2308.15434

Country: North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)