AITopics

1905.06466

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Chu, Lei, Li, Husheng, Qiu, Robert Caiming

LEMO: Learn to Equalize for MIMO-OFDM Systems with Low-Resolution ADCs

This paper develops a new deep neural network optimized equalization framework for massive multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems that employ low-resolution analog-to-digital converters (ADCs) at the base station (BS). The use of low-resolution ADCs could largely reduce hardware complexity and circuit power consumption, however, makes the channel station information almost blind to the BS, hence causing difficulty in solving the equalization problem. In this paper, we consider a supervised learning architecture, where the goal is to learn a representative function that can predict the targets (constellation points) from the inputs (outputs of the low-resolution ADCs) based on the labeled training data (pilot signals). Specially, our main contributions are two-fold: 1) First, we design a new activation function, whose outputs are close to the constellation points when the parameters are finally optimized, to help us fully exploit the stochastic gradient descent method for the discrete optimization problem. 2) Second, an unsupervised loss is designed and then added to the optimization objective, aiming to enhance the representation ability (so-called generalization). The experimental results reveal that the proposed equalizer is robust to different channel taps (i.e., Gaussian, and Poisson), significantly outperforms the linearized MMSE equalizer, and shows potential for pilot saving.

artificial intelligence, equalizer, machine learning, (17 more...)

1905.06329

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Doerr, Andreas, Volpp, Michael, Toussaint, Marc, Trimpe, Sebastian, Daniel, Christian

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

estimator, exploration, surrogate model, (14 more...)

1905.0571

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Raghunathan, Arvind U., Cherian, Anoop, Jha, Devesh K.

Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function

Computing Nash equilibrium (NE) of multi-player games has witnessed renewed interest due to recent advances in generative adversarial networks. However, computing equilibrium efficiently is challenging. To this end, we introduce the Gradient-based Nikaido-Isoda (GNI) function which serves: (i) as a merit function, vanishing only at the first-order stationary points of each player's optimization problem, and (ii) provides error bounds to a stationary Nash point. Gradient descent is shown to converge sublinearly to a first-order stationary point of the GNI function. For the particular case of bilinear min-max games and multi-player quadratic games, the GNI function is convex. Hence, the application of gradient descent in this case yields linear convergence to an NE (when one exists). In our numerical experiments, we observe that the GNI formulation always converges to the first-order stationary point of each player's optimization problem.

artificial intelligence, convergence, machine learning, (13 more...)

1905.05927

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Tran-Dinh, Quoc, Pham, Nhan H., Phan, Dzung T., Nguyen, Lam M.

Hybrid Stochastic Gradient Descent Algorithms for Stochastic Nonconvex Optimization

We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to some useful property on its variance. We limit our consideration to a hybrid SARAH-SGD for nonconvex expectation problems. However, our idea can be extended to handle a broader class of estimators in both convex and nonconvex settings. We propose a new single-loop stochastic gradient descent algorithm that can achieve $O(\max\{\sigma^3\varepsilon^{-1},\sigma\varepsilon^{-3}\})$-complexity bound to obtain an $\varepsilon$-stationary point under smoothness and $\sigma^2$-bounded variance assumptions. This complexity is better than $O(\sigma^2\varepsilon^{-4})$ often obtained in state-of-the-art SGDs when $\sigma < O(\varepsilon^{-3})$. We also consider different extensions of our method, including constant and adaptive step-size with single-loop, double-loop, and mini-batch variants. We compare our algorithms with existing methods on several datasets using two nonconvex models.

accuracy, artificial intelligence, machine learning, (15 more...)

1905.0592

Country: North America (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Golkar, Siavash, Cho, Kyunghyun

Task-Driven Data Verification via Gradient Descent

We introduce a novel algorithm for the detection of possible sample corruption such as mislabeled samples in a training dataset given a small clean validation set. We use a set of inclusion variables which determine whether or not any element of the noisy training set should be included in the training of a network. We compute these inclusion variables by optimizing the performance of the network on the clean validation set via "gradient descent on gradient descent" based learning. The inclusion variables as well as the network trained in such a way form the basis of our methods, which we call Corruption Detection via Gradient Descent (CDGD). This algorithm can be applied to any supervised machine learning task and is not limited to classification problems. We provide a quantitative comparison of these methods on synthetic and real world datasets.

artificial intelligence, dataset, machine learning, (18 more...)

1905.05843

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Robust Neural Network Training using Periodic Sampling over Model Weights

Tripathi, Samarth, Liu, Jiayi, Kurup, Unmesh, Shah, Mohak

Deep neural networks provide best-in-class performance for a number of computer vision problems. However, training these networks is computationally intensive and requires fine-tuning various hyperparameters. In addition, performance swings widely as the network converges making it hard to decide when to stop training. In this paper, we introduce a trio of techniques (PSWA, PWALKS, and PSWM) centered around periodic sampling of model weights that provide consistent and more robust convergence on a variety of vision problems (classification, detection, segmentation) and gradient update methods (vanilla SGD, Momentum, Adam) with marginal additional computation time. Our techniques use existing optimal training policies but converge in a less volatile fashion with performance improvements that are approximately monotonic. Our analysis of the loss surface shows that these techniques also produce minima that are deeper and wider than those found by SGD.

artificial intelligence, machine learning, pswa, (18 more...)

1905.05774

Country: North America > United States > California > Santa Clara County > Santa Clara (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

arXiv.org Machine LearningMay-13-2019

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

Bi, Jia, Gunn, Steve R.

A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory shows these optimization methods can converge by using an unbiased gradient estimator. However, in practice biased gradient estimation can allow more efficient convergence to the vicinity since an unbiased approach is computationally more expensive. To produce fast convergence there are two trade-offs of these optimization strategies which are between stochastic/batch, and between biased/unbiased. This paper proposes an integrated approach which can control the nature of the stochastic element in the optimizer and can balance the trade-off of estimator between the biased and unbiased by using a hyper-parameter. It is shown theoretically and experimentally that this hyper-parameter can be configured to provide an effective balance to improve the convergence rate.

artificial intelligence, estimator, machine learning, (19 more...)

1905.05185

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Machine LearningMay-13-2019

Differentiable Game Mechanics

Letcher, Alistair, Balduzzi, David, Racaniere, Sebastien, Martens, James, Foerster, Jakob, Tuyls, Karl, Graepel, Thore

Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood -- and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in n-player differentiable games. The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs -- while at the same time being applicable to, and having guarantees in, much more general cases.

converge, machine learning, reinforcement learning, (18 more...)

1905.04926

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)

Park, Daniel S., Sohl-Dickstein, Jascha, Le, Quoc V., Smith, Samuel L.

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

arXiv.org Machine LearningMay-9-2019

We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes ("Standard" and "NTK"). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases.

artificial intelligence, batch normalization, machine learning, (15 more...)

1905.03776

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)