AITopics

2112.00845

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Xu, Yuetong, Mirzasoleiman, Baharan

Ordering for Non-Replacement SGD

arXiv.org Artificial IntelligenceJun-27-2023

One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only started to gain attention theoretically in recent years. With different convergence rates developed for random shuffling and incremental gradient descent, we seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. Based on existing bounds of the distance between the optimal and current iterate, we derive an upper bound that is dependent on the gradients at the beginning of the epoch. Through analysis of the bound, we are able to develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. We further test and verify our results through experiments on synthesis and real data sets. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks, which show promising results.

algorithm, artificial intelligence, machine learning, (18 more...)

2306.15848

Country:

North America > United States > New York (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Zaballa, Vincent D., Hui, Elliot E.

Stochastic Gradient Bayesian Optimal Experimental Designs for Simulation-based Inference

arXiv.org Artificial IntelligenceJun-27-2023

Simulation-based inference (SBI) methods tackle complex scientific models with challenging inverse problems. However, SBI models often face a significant hurdle due to their non-differentiable nature, which hampers the use of gradient-based optimization techniques. Bayesian Optimal Experimental Design (BOED) is a powerful approach that aims to make the most efficient use of experimental resources for improved inferences. While stochastic gradient BOED methods have shown promising results in high-dimensional design problems, they have mostly neglected the integration of BOED with SBI due to the difficult non-differentiable property of many SBI simulators. In this work, we establish a crucial connection between ratio-based SBI inference algorithms and stochastic gradient-based variational inference by leveraging mutual information bounds. This connection allows us to extend BOED to SBI applications, enabling the simultaneous optimization of experimental designs and amortized inference functions. We demonstrate our approach on a simple linear model and offer implementation details for practitioners.

artificial intelligence, gradient bayesian optimal experimental design, machine learning, (12 more...)

2306.15731

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Gueuwou, Shester, Gyening, Rose-Mary Owusuaa Mensah

Stance Prediction and Analysis of Twitter data : A case study of Ghana 2020 Presidential Elections

arXiv.org Artificial IntelligenceJun-27-2023

On December 7, 2020, Ghanaians participated in the polls to determine their president for the next four years. To gain insights from this presidential election, we conducted stance analysis (which is not always equivalent to sentiment analysis) to understand how Twitter, a popular social media platform, reflected the opinions of its users regarding the two main presidential candidates. We collected a total of 99,356 tweets using the Twitter API (Tweepy) and manually annotated 3,090 tweets into three classes: Against, Neutral, and Support. We then performed preprocessing on the tweets. The resulting dataset was evaluated using two lexicon-based approaches, VADER and TextBlob, as well as five supervised machine learning-based approaches: Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Na\"ive Bayes (MNB), Stochastic Gradient Descent (SGD), and Random Forest (RF), based on metrics such as accuracy, precision, recall, and F1-score. The best performance was achieved by Logistic Regression with an accuracy of 71.13%. We utilized Logistic Regression to classify all the extracted tweets and subsequently conducted an analysis and discussion of the results. For access to our data and code, please visit: https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git

artificial intelligence, machine learning, social media, (5 more...)

2306.14203

Country: Africa > Ghana (0.60)

Genre: Research Report (1.00)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron

Wu, Jingfeng, Zou, Difan, Chen, Zixiang, Braverman, Vladimir, Gu, Quanquan, Kakade, Sham M.

This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011) and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable to SGD for high-dimensional ReLU regression.

finite-sample analysis, glm-tron, regression, (12 more...)

2303.02255

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.34)

Atici, Salih, Pan, Hongyi, Cetin, Ahmet Enis

Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks

In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-18, WResNet-20, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-18 on CIFAR-10 from 92.42\% to 92.71\%, WResNet-20 on CIFAR-100 from 76.20\% to 77.39\%, and ResNet-50 on ImageNet-1K from 75.52\% to 75.67\%.

algorithm, dataset, optimization algorithm, (14 more...)

2212.09921

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kumar, Pranav, Mandal, Taniya, Mondal, Swapnamay

Black holes and the loss landscape in machine learning

arXiv.org Machine LearningJun-26-2023

Understanding the loss landscape is an important problem in machine learning. One key feature of the loss function, common to many neural network architectures, is the presence of exponentially many low lying local minima. Physical systems with similar energy landscapes may provide useful insights. In this work, we point out that black holes naturally give rise to such landscapes, owing to the existence of black hole entropy. For definiteness, we consider 1/8 BPS black holes in $\mathcal{N} = 8$ string theory. These provide an infinite family of potential landscapes arising in the microscopic descriptions of corresponding black holes. The counting of minima amounts to black hole microstate counting. Moreover, the exact numbers of the minima for these landscapes are a priori known from dualities in string theory. Some of the minima are connected by paths of low loss values, resembling mode connectivity. We estimate the number of runs needed to find all the solutions. Initial explorations suggest that Stochastic Gradient Descent can find a significant fraction of the minima.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1007/JHEP10(2023)107

2306.14817

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Tailor, Dharmesh, Khan, Mohammad Emtiyaz, Nalisnick, Eric

Exploiting Inferential Structure in Neural Processes

Neural Processes (NPs) are appealing due to their ability to perform fast adaptation based on a context set. This set is encoded by a latent variable, which is often assumed to follow a simple distribution. However, in real-word settings, the context set may be drawn from richer distributions having multiple modes, heavy tails, etc. In this work, we provide a framework that allows NPs' latent variable to be given a rich prior defined by a graphical model. These distributional assumptions directly translate into an appropriate aggregation strategy for the context set. Moreover, we describe a message-passing procedure that still allows for end-to-end optimization with stochastic gradients. We demonstrate the generality of our framework by using mixture and Student-t assumptions that yield improvements in function modelling and test-time robustness.

artificial intelligence, machine learning, neural process, (13 more...)

2306.15169

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Lin, Sheng-Hsuan, Kuijpers, Olivier, Peterhansl, Sebastian, Pollmann, Frank

Distributive Pre-Training of Generative Modeling Using Matrix-Product States

Tensor networks have recently found applications in machine learning for both supervised learning and unsupervised learning. The most common approaches for training these models are gradient descent methods. In this work, we consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. The algorithm could be parallelized easily and only iterates through the dataset once. Hence, it serves as a pre-training algorithm. We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks. Furthermore, we provide an interpretation of the algorithm as a compressed quantum kernel density estimation for the probability amplitude of input data.

artificial intelligence, feature map, machine learning, (16 more...)

2306.14787

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Badreddine, Samy, Serafini, Luciano, Spranger, Michael

logLTN: Differentiable Fuzzy Logic in the Logarithm Space

The AI community is increasingly focused on merging logic with deep learning to create Neuro-Symbolic (NeSy) paradigms and assist neural approaches with symbolic knowledge. A significant trend in the literature involves integrating axioms and facts in loss functions by grounding logical symbols with neural networks and operators with fuzzy semantics. Logic Tensor Networks (LTN) is one of the main representatives in this category, known for its simplicity, efficiency, and versatility. However, it has been previously shown that not all fuzzy operators perform equally when applied in a differentiable setting. Researchers have proposed several configurations of operators, trading off between effectiveness, numerical stability, and generalization to different formulas. This paper presents a configuration of fuzzy operators for grounding formulas end-to-end in the logarithm space. Our goal is to develop a configuration that is more effective than previous proposals, able to handle any formula, and numerically stable. To achieve this, we propose semantics that are best suited for the logarithm space and introduce novel simplifications and improvements that are crucial for optimization via gradient-descent. We use LTN as the framework for our experiments, but the conclusions of our work apply to any similar NeSy framework. Our findings, both formal and empirical, show that the proposed configuration outperforms the state-of-the-art and that each of our modifications is essential in achieving these results.

artificial intelligence, machine learning, natural language, (19 more...)

2306.14546

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)