AITopics

At the working heart of policy iteration algorithms commonly used and studied in the discounted setting of reinforcement learning, the policy evaluation step estimates the value of state with samples from a Markov reward process induced by following a Markov policy in a Markov decision process. We propose a simple and efficient estimator called \emph{loop estimator} that exploits the regenerative structure of Markov reward processes without explicitly estimating a full model. Our method enjoys a space complexity of $O(1)$ when estimating the value of a single positive recurrent state $s$ unlike TD (with $O(S)$) or model-based methods (with $O(S^2)$). Moreover, the regenerative structure enables us to show, without relying on the generative model approach, that the estimator has an instance-dependent convergence rate of $\widetilde{O}(\sqrt{\tau_s/T})$ over steps $T$ on a single sample path, where $\tau_s$ is the maximal expected hitting time to state $s$. In preliminary numerical experiments, the loop estimator outperforms model-free methods, such as TD(k), and is competitive with the model-based estimator.

estimator, loop estimator, probability, (13 more...)

2002.06299

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Bamler, Robert, Mandt, Stephan

Extreme Classification via Adversarial Softmax Approximation

Training a classifier over a large number of classes, known as 'extreme classification', has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes $C$, which often is prohibitively expensive. A popular scalable softmax approximation relies on uniform negative sampling, which suffers from slow convergence due a poor signal-to-noise ratio. In this paper, we propose a simple training method for drastically enhancing the gradient signal by drawing negative samples from an adversarial model that mimics the data distribution. Our contributions are three-fold: (i) an adversarial sampling mechanism that produces negative samples at a cost only logarithmic in $C$, thus still resulting in cheap gradient updates; (ii) a mathematical proof that this adversarial sampling minimizes the gradient variance while any bias due to non-uniform sampling can be removed; (iii) experimental results on large scale data sets that show a reduction of the training time by an order of magnitude relative to several competitive baselines.

auxiliary model, classification, negative sample, (17 more...)

2002.06298

Country: North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Gopalakrishnan, Sriram, Soni, Utkarsh

Let Me At Least Learn What You Really Like: Dealing With Noisy Humans When Learning Preferences

Learning the preferences of a human improves the quality of the interaction with the human. The number of queries available to learn preferences maybe limited especially when interacting with a human, and so active learning is a must. One approach to active learning is to use uncertainty sampling to decide the informativeness of a query. In this paper, we propose a modification to uncertainty sampling which uses the expected output value to help speed up learning of preferences. We compare our approach with the uncertainty sampling baseline, as well as conduct an ablation study to test the validity of each component of our approach.

query, relevant feature, variance, (15 more...)

2002.06288

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Xiong, Huaqing, Xu, Tengyu, Liang, Yingbin, Zhang, Wei

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning that incorporate AMSGrad updates (a standard alternative of Adam in theoretical analysis), referred to as PG-AMSGrad and TD-AMSGrad, respectively. Moreover, our analysis focuses on Markovian sampling for both algorithms. We show that under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $\mathcal{O}(1/T)$ (where $T$ denotes the number of iterations), and with a diminishing stepsize converges exactly to a stationary point at the rate of $\mathcal{O}(\log^2 T/\sqrt{T})$. Furthermore, under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $\mathcal{O}(1/T)$, and with a diminishing stepsize converges exactly to the global optimum at the rate of $\mathcal{O}(\log T/\sqrt{T})$. Our study develops new techniques for analyzing the Adam-type RL algorithms under Markovian sampling.

algorithm, function approximation, stepsize, (14 more...)

2002.06286

Country:

North America > United States > Ohio (0.04)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Higher order co-occurrence tensors for hypergraphs via face-splitting

Bischof, Bryan

A popular trick for computing a pairwise co-occurrence matrix is the product of an incidence matrix and its transpose. We present an analog for higher order tuple co-occurrences using the face-splitting product, or alternately known as the transpose Khatri-Rao product. These higher order co-occurrences encode the commonality of tokens in the company of other tokens, and thus generalize the mutual information commonly studied. We demonstrate this tensor's use via a popular NLP model, and hypergraph models of similarity. Studying an implicit meaning of a collection of things via their relationships with other things of the same type is a popular technique.

co-occurrence tensor, matrix, tensor, (15 more...)

2002.06285

Country: Africa > Senegal > Kolda Region > Kolda (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

Huang, Kaixuan, Wang, Yuqing, Tao, Molei, Zhao, Tuo

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.

neural network, probability, resnet, (12 more...)

2002.06262

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Grand-Clement, Julien, Chan, Carri W., Goyal, Vineet, Escobar, Gabriel

Robust Policies For Proactive ICU Transfers

Patients whose transfer to the Intensive Care Unit (ICU) is unplanned are prone to higher mortality rates than those who were admitted directly to the ICU. Recent advances in machine learning to predict patient deterioration have introduced the possibility of \emph{proactive transfer} from the ward to the ICU. In this work, we study the problem of finding \emph{robust} patient transfer policies which account for uncertainty in statistical estimates due to data limitations when optimizing to improve overall patient care. We propose a Markov Decision Process model to capture the evolution of patient health, where the states represent a measure of patient severity. Under fairly general assumptions, we show that an optimal transfer policy has a threshold structure, i.e., that it transfers all patients above a certain severity level to the ICU (subject to available capacity). As model parameters are typically determined based on statistical estimations from real-world data, they are inherently subject to misspecification and estimation errors. We account for this parameter uncertainty by deriving a robust policy that optimizes the worst-case reward across all plausible values of the model parameters. We show that the robust policy also has a threshold structure under fairly general assumptions. Moreover, it is more aggressive in transferring patients than the optimal nominal policy, which does not take into account parameter uncertainty. We present computational experiments using a dataset of hospitalizations at 21 KNPC hospitals, and present empirical evidence of the sensitivity of various hospital metrics (mortality, length-of-stay, average ICU occupancy) to small changes in the parameters. Our work provides useful insights into the impact of parameter uncertainty on deriving simple policies for proactive ICU transfer that have strong empirical performance and theoretical guarantees.

matrix, severity score, threshold policy, (12 more...)

2002.06247

Country:

North America > United States > California (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.92)
Health & Medicine > Therapeutic Area > Oncology (0.67)
Health & Medicine > Therapeutic Area > Endocrinology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Data Science (0.87)

Samadianfard, Saeed, Hashemi, Sajjad, Kargar, Katayoun, Izadyar, Mojtaba, Mostafaeipour, Ali, Mosavi, Amir, Nabipour, Narjes, Shamshirband, Shahaboddin

Wind speed prediction using a hybrid model of the multi-layer perceptron and whale optimization algorithm

Wind power as a renewable source of energy, has numerous economic, environmental and social benefits. In order to enhance and control renewable wind power, it is vital to utilize models that predict wind speed with high accuracy. Due to neglecting of requirement and significance of data preprocessing and disregarding the inadequacy of using a single predicting model, many traditional models have poor performance in wind speed prediction. In the current study, for predicting wind speed at target stations in the north of Iran, the combination of a multi-layer perceptron model (MLP) with the Whale Optimization Algorithm (WOA) used to build new method (MLP-WOA) with a limited set of data (2004-2014). Then, the MLP-WOA model was utilized at each of the ten target stations, with the nine stations for training and tenth station for testing (namely: Astara, Bandar-E-Anzali, Rasht, Manjil, Jirandeh, Talesh, Kiyashahr, Lahijan, Masuleh, and Deylaman) to increase the accuracy of the subsequent hybrid model. The capability of the hybrid model in wind speed forecasting at each target station was compared with the MLP model without the WOA optimizer. To determine definite results, numerous statistical performances were utilized. For all ten target stations, the MLP-WOA model had precise outcomes than the standalone MLP model. The hybrid model had acceptable performances with lower amounts of the RMSE, SI and RE parameters and higher values of NSE, WI, and KGE parameters. It was concluded that the WOA optimization algorithm can improve the prediction accuracy of MLP model and may be recommended for accurate wind speed prediction.

algorithm, prediction, wind speed, (16 more...)

2002.06226

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.06)
Asia > Middle East > Oman (0.05)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.05)
(11 more...)

Genre: Research Report > New Finding (0.49)

Industry: Energy > Renewable > Wind (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Sinha, Samarth, Goyal, Anirudh, Raffel, Colin, Odena, Augustus

Top-K Training of GANs: Improving Generators by Making Critics Less Critical

We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm that materially improves results with no increase in computational cost: When updating the generator parameters, we simply zero out the gradient contributions from the elements of the batch that the critic scores as `least realistic'. Through experiments on many different GAN variants, we show that this `top-k update' procedure is a generally applicable improvement. In order to understand the nature of the improvement, we conduct extensive analysis on a simple mixture-of-Gaussians dataset and discover several interesting phenomena. Among these is that, when gradient updates are computed using the worst-scoring batch elements, samples can actually be pushed further away from the their nearest mode.

arxiv preprint arxiv, gan, generator, (11 more...)

2002.06224

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Finardi, Paulo, Campiotti, Israel, Plensack, Gustavo, de Souza, Rafael Derradi, Nogueira, Rodrigo, Pinheiro, Gustavo, Lotufo, Roberto

Electricity Theft Detection with self-attention

In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China. Our key contribution is the introduction of a multi-head self-attention mechanism concatenated with dilated convolutions and unified by a convolution of kernel size $1$. Moreover, we introduce a binary input channel (Binary Mask) to identify the position of the missing values, allowing the network to learn how to deal with these values. Our model achieves an AUC of $0.926$ which is an improvement in more than $17\%$ with respect to previous baseline work. The code is available on GitHub at https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention.

dataset, electricity theft detection, transformation, (13 more...)

2002.06219

Country:

Asia > China (0.24)
Europe > Ireland (0.14)
South America > Brazil > São Paulo (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)