Goto

Collaborating Authors

 Gradient Descent


ImageNet Challenging Classification with the Raspberry Pi: An Incremental Local Stochastic Gradient Descent Algorithm

arXiv.org Artificial Intelligence

With rising powerful, low-cost embedded devices, the edge computing has become an increasingly popular choice. In this paper, we propose a new incremental local stochastic gradient descent (SGD) tailored on the Raspberry Pi to deal with large ImageNet ILSVRC 2010 dataset having 1,261,405 images with 1,000 classes. The local SGD splits the data block into $k$ partitions using $k$means algorithm and then it learns in the parallel way SGD models in each data partition to classify the data locally. The incremental local SGD sequentially loads small data blocks of the training dataset to learn local SGD models. The numerical test results on Imagenet dataset show that our incremental local SGD algorithm with the Raspberry Pi 4 is faster and more accurate than the state-of-the-art linear SVM run on a PC Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores.


Machine Learning Checklist: Cost Function and Gradient Descent

#artificialintelligence

Now let's dive in to cost function It calculates the measure between the original value and your predicted value generated from your model. Cost functions may differ according to your selection, your manager's selection, or the project necessity. Imagine that you got a Data set about house prices. You have built the model and made a prediction afterward. The actual price of the house is: $155.000


On the generalization of learning algorithms that do not converge

arXiv.org Artificial Intelligence

Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.


Papers Simplified: »Anticorrelated Noise Injection for Improved Generalization«

#artificialintelligence

In this article, I will not explain to you all of the (exciting!) Instead, I will provide you with some implementations and pictures that should make it possible to understand the gist of the paper. I also gave my best to create an implementation of the optimizers mentioned in the paper, but use the code with care because I'm also not an expert in this regard. In order to understand what Anti-PGD (Anti-Perturbed Gradient Descent) is about, let us shortly recap how GD and the derived algorithms such as SGD and PGD work. Let us assume that we want to minimize a function f with a gradient denoted as f(θ).


Gradient-Based Meta-Learning Using Uncertainty to Weigh Loss for Few-Shot Learning

arXiv.org Artificial Intelligence

Model-Agnostic Meta-Learning (MAML) is one of the most successful meta-learning techniques for few-shot learning. It uses gradient descent to learn commonalities between various tasks, enabling the model to learn the meta-initialization of its own parameters to quickly adapt to new tasks using a small amount of labeled training data. A key challenge to few-shot learning is task uncertainty. Although a strong prior can be obtained from meta-learning with a large number of tasks, a precision model of the new task cannot be guaranteed because the volume of the training dataset is normally too small. In this study, first, in the process of choosing initialization parameters, the new method is proposed for task-specific learner adaptively learn to select initialization parameters that minimize the loss of new tasks. Then, we propose two improved methods for the meta-loss part: Method 1 generates weights by comparing meta-loss differences to improve the accuracy when there are few classes, and Method 2 introduces the homoscedastic uncertainty of each task to weigh multiple losses based on the original gradient descent, as a way to enhance the generalization ability to novel classes while ensuring accuracy improvement. Compared with previous gradient-based meta-learning methods, our model achieves better performance in regression tasks and few-shot classification and improves the robustness of the model to the learning rate and query sets in the meta-test set.


An initial alignment between neural network and target is needed for gradient descent to learn

arXiv.org Artificial Intelligence

This paper introduces the notion of ``Initial Alignment'' (INAL) between a neural network at initialization and a target function. It is proved that if a network and a Boolean target function do not have a noticeable INAL, then noisy gradient descent on a fully connected network with normalized i.i.d. initialization will not learn in polynomial time. Thus a certain amount of knowledge about the target (measured by the INAL) is needed in the architecture design. This also provides an answer to an open problem posed in [AS20]. The results are based on deriving lower-bounds for descent algorithms on symmetric neural networks without explicit knowledge of the target function beyond its INAL.


Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback

arXiv.org Artificial Intelligence

In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far as we know, Meta-MFW is the first algorithm to obtain $1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the Mono-MFW algorithm, which reduces the per-function stochastic gradient evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of $O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.


Fast & Furious: Modelling Malware Detection as Evolving Data Streams

arXiv.org Artificial Intelligence

Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples' features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (about 130K apps) and a subset of AndroZoo (about 285K apps). We used these datasets to train an Adaptive Random Forest (ARF) classifier, as well as a Stochastic Gradient Descent (SGD) classifier. We also ordered all datasets samples using their VirusTotal submission timestamp and then extracted features from their textual attributes using two algorithms (Word2Vec and TF-IDF). Then, we conducted experiments comparing both feature extractors, classifiers, as well as four drift detectors (DDM, EDDM, ADWIN, and KSWIN) to determine the best approach for real environments. Finally, we compare some possible approaches to mitigate concept drift and propose a novel data stream pipeline that updates both the classifier and the feature extractor. To do so, we conducted a longitudinal evaluation by (i) classifying malware samples collected over nine years (2009-2018), (ii) reviewing concept drift detection algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches to mitigate the issue, and (iv) proposing an ML data stream pipeline that outperformed literature approaches.


Federated Quantum Natural Gradient Descent for Quantum Federated Learning

arXiv.org Artificial Intelligence

The heart of Quantum Federated Learning (QFL) is associated with a distributed learning architecture across several local quantum devices and a more efficient training algorithm for the QFL is expected to minimize the communication overhead among different quantum participants. In this work, we put forth an efficient learning algorithm, namely federated quantum natural gradient descent (FQNGD), applied in a QFL framework which consists of the variational quantum circuit (VQC)-based quantum neural networks (QNN). The FQNGD algorithm admits much fewer training iterations for the QFL model to get converged and it can significantly reduce the total communication cost among local quantum devices. Compared with other federated learning algorithms, our experiments on a handwritten digit classification dataset corroborate the effectiveness of the FQNGD algorithm for the QFL in terms of a faster convergence rate on the training dataset and higher accuracy on the test one.


Adaptive Zeroth-Order Optimisation of Nonconvex Composite Objectives

arXiv.org Artificial Intelligence

In this paper, we propose and analyse algorithms for zeroth-order optimisation of non-convex composite objectives, focusing on reducing the complexity dependence on dimensionality. This is achieved by exploiting the low dimensional structure of the decision set using the stochastic mirror descent method with an entropy alike function, which performs gradient descent in the space equipped with the maximum norm. To improve the gradient estimation, we replace the classic Gaussian smoothing method with a sampling method based on the Rademacher distribution and show that the mini-batch method copes with the non-Euclidean geometry. To avoid tuning hyperparameters, we analyse the adaptive stepsizes for the general stochastic mirror descent and show that the adaptive version of the proposed algorithm converges without requiring prior knowledge about the problem.