AITopics

2103.14723

Country:

Asia > Middle East > Israel (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

#artificialintelligenceApr-9-2021, 08:03:59 GMT

8 Outstanding Papers At ICLR 2021

International Conference on Learning Representations (ICLR) recently announced the ICLR 2021 Outstanding Paper Awards winners. It recognised eight papers out of the 860 submitted this year. The papers were evaluated for both technical quality and the potential to create a practical impact. The committee was chaired by Ivan Titov (U. This paper deals with parameterising hypercomplex multiplications using arbitrarily learnable parameters compared with the fully-connected layer counterpart.

iclr 2021, link predictor, outstanding paper, (13 more...)

#artificialintelligence

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > Middle East > Israel (0.05)

Genre: Personal > Honors (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.33)

arXiv.org Artificial IntelligenceApr-8-2021

Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Kong, Seo Taek, Jeon, Soomin, Lee, Jaewon, Lee, Hongseok, Jung, Kyu-Hwan

Deep learning (DL) relies on massive amounts of labeled data, and improving its labeled sample-efficiency remains one of the most important problems since its advent. Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts. Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data. Given the accessible pool of unlabeled data in pool-based AL, it seems natural to use SSL when training and AL to update the labeled set; however, algorithms designed for their combination remain limited. In this work, we first prove that convergence of gradient descent on sufficiently wide ReLU networks can be expressed in terms of their Gram matrix' eigen-spectrum. Equipped with a few theoretical insights, we propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set, by formulating an acquisition step in terms of improving training dynamics. Extensive experiments show that SSL algorithms coupled with CRC can achieve high performance using very few labeled data.

algorithm, learning, ssl algorithm, (14 more...)

2104.03525

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Majidi, Negin, Amid, Ehsan, Talebi, Hossein, Warmuth, Manfred K.

Exponentiated Gradient Reweighting for Robust Training Under Label Noise and Beyond

arXiv.org Machine LearningApr-3-2021

Many learning tasks in machine learning can be viewed as taking a gradient step towards minimizing the average loss of a batch of examples in each training iteration. When noise is prevalent in the data, this uniform treatment of examples can lead to overfitting to noisy examples with larger loss values and result in poor generalization. Inspired by the expert setting in on-line learning, we present a flexible approach to learning from noisy examples. Specifically, we treat each training example as an expert and maintain a distribution over all examples. We alternate between updating the parameters of the model using gradient descent and updating the example weights using the exponentiated gradient update. Unlike other related methods, our approach handles a general class of loss functions and can be applied to a wide range of noise types and applications. We show the efficacy of our approach for multiple learning settings, namely noisy principal component analysis and a variety of noisy classification problems.

exponentiated gradient reweighting, noise, reweighting, (16 more...)

2104.01493

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceApr-2-2021

Neurons learn slower than they think

Kulikovskikh, Ilona

Recent studies revealed complex convergence dynamics in gradient-based methods, which has been little understood so far. Changing the step size to balance between high convergence rate and small generalization error may not be sufficient: maximizing the test accuracy usually requires a larger learning rate than minimizing the training loss. To explore the dynamic bounds of convergence rate, this study introduces \textit{differential capability} into an optimization process, which measures whether the test accuracy increases as fast as a model approaches the decision boundary in a classification problem. The convergence analysis showed that: 1) a higher convergence rate leads to slower capability growth; 2) a lower convergence rate results in faster capability growth and decay; 3) regulating a convergence rate in either direction reduces differential capability.

convergence rate, differential capability, gradient descent, (13 more...)

2104.02578

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)

Jentzen, Arnulf, Riekert, Adrian

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

arXiv.org Artificial IntelligenceApr-1-2021

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with $d \in \mathbb{N}$ neurons on the input layer, $H \in \mathbb{N}$ neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.

artificial intelligence, lemma 2, machine learning, (18 more...)

doi: 10.1007/s00033-022-01716-w

2104.00277

Country:

Europe > Germany (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Chung, Heejung W., Datta, Avoy, Waites, Chris

GABO: Graph Augmentations with Bi-level Optimization

arXiv.org Artificial IntelligenceApr-1-2021

Data augmentation refers to a wide range of techniques for improving model generalization by augmenting training examples. Oftentimes such methods require domain knowledge about the dataset at hand, spawning a plethora of recent literature surrounding automated techniques for data augmentation. In this work we apply one such method, bilevel optimization, to tackle the problem of graph classification on the ogbg-molhiv dataset. Our best performing augmentation achieved a test ROCAUC score of 77.77 % with a GIN+virtual classifier, which makes it the most effective augmenter for this classifier on the leaderboard. This framework combines a GIN layer augmentation generator with a bias transformation and outperforms the same classifier augmented using the state-of-the-art FLAG augmentation.

augmentation, bilevel optimization, optimization, (12 more...)

2104.00722

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Vargo, Alexander, Zhang, Fan, Yurochkin, Mikhail, Sun, Yuekai

Individually Fair Gradient Boosting

arXiv.org Machine LearningMar-30-2021

We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias.

conference paper, fairness, individual fairness, (16 more...)

2103.16785

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance > Credit (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Machine LearningMar-29-2021

The Complexity of Nonconvex-Strongly-Concave Minimax Optimization

Zhang, Siqi, Yang, Junchi, Guzmán, Cristóbal, Kiyavash, Negar, He, Niao

This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $\Omega(\sqrt{\kappa}\Delta L\epsilon^{-2})$ and $\Omega(n+\sqrt{n\kappa}\Delta L\epsilon^{-2})$ for the two settings, respectively, where $\kappa$ is the condition number, $L$ is the smoothness constant, and $\Delta$ is the initial gap. Our result reveals substantial gaps between these limits and best-known upper bounds in the literature. To close these gaps, we introduce a generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems. In the general setting, the complexity of our proposed algorithm nearly matches the lower bound; in particular, it removes an additional poly-logarithmic dependence on accuracy present in previous works. In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.

algorithm, complexity, optimization, (13 more...)

2103.15888

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)
South America > Chile (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Nanni, Loris, Maguolo, Gianluca, Lumini, Alessandra

Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks

arXiv.org Artificial IntelligenceMar-26-2021

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on a minibatch. Several variants of SGD have been proposed to make adaptive step sizes for each parameter (adaptive gradient) and take into account the previous updates (momentum). Among several alternative of SGD the most popular are AdaGrad, AdaDelta, RMSProp and Adam which scale coordinates of the gradient by square roots of some form of averaging of the squared coordinates in the past gradients and automatically adjust the learning rate on a parameter basis. In this work, we compare Adam based variants based on the difference between the present and the past gradients, the step size is adjusted for each parameter. We run several tests benchmarking proposed methods using medical image data. The experiments are performed using ResNet50 architecture neural network. Moreover, we have tested ensemble of networks and the fusion with ResNet50 trained with stochastic gradient descent. To combine the set of ResNet50 the simple sum rule has been applied. Proposed ensemble obtains very high performance, it obtains accuracy comparable or better than actual state of the art. To improve reproducibility and research efficiency the MATLAB source code used for this research is available at GitHub: https://github.com/LorisNanni.

ensemble, gradient, sgd, (15 more...)

2103.14689

Country: Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)