Goto

Collaborating Authors

 inq


Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Chen, Xiaoyu, Hu, Jiachen, Li, Lihong, Wang, Liwei

arXiv.org Machine Learning

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon $H$ and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.


Andile Ngcaba's inq Wants to be Africa's Number one AI Service Provider.

#artificialintelligence

ICT industry veteran Andile Ngcaba's inq., a Pan-African digital service provider, wants to be Africa's number one artificial intelligence (AI) service provider. The company has points of contacts in 12 African cities, Johannesburg, Gaborone, Lusaka, Ndola, Blantyre, Lilongwe, Mzuzu, Lagos, Abuja, Port Harcourt, Kanu and Abidjan. It has concluded the 100% acquisition of Vodacom Business Africa's operations in Nigeria, Zambia and Cote d'Ivoire with a further planned acquisition in Cameroon pending regulatory approvals. At the time of the announcement of the transaction last June, inq. said this deals represents a significant milestone to its vision to be a leading provider of cloud and digitally based services in key markets across sub-Saharan Africa and provides additional vital assets in its build-out of a regional footprint. Today, inq. said this landmark transaction grows inq.'s regional footprint to 13 cities in 7 countries across Africa including its existing operations in Botswana, Malawi and Mozambique.


Strong Asymptotic Optimality in General Environments

Cohen, Michael K., Catt, Elliot, Hutter, Marcus

arXiv.org Artificial Intelligence

Reinforcement Learning agents are expected to eventually perform well. Typically, this takes the form of a guarantee about the asymptotic behavior of an algorithm given some assumptions about the environment. We present an algorithm for a policy whose value approaches the optimal value with probability 1 in all computable probabilistic environments, provided the agent has a bounded horizon. This is known as strong asymptotic optimality, and it was previously unknown whether it was possible for a policy to be strongly asymptotically optimal in the class of all computable probabilistic environments. Our agent, Inquisitive Reinforcement Learner (Inq), is more likely to explore the more it expects an exploratory action to reduce its uncertainty about which environment it is in, hence the term inquisitive. Exploring inquisitively is a strategy that can be applied generally; for more manageable environment classes, inquisitiveness is tractable. We conducted experiments in "grid-worlds" to compare the Inquisitive Reinforcement Learner to other weakly asymptotically optimal agents.


Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Zhou, Aojun, Yao, Anbang, Guo, Yiwen, Xu, Lin, Chen, Yurong

arXiv.org Artificial Intelligence

This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue, as benefiting from two innovations. On one hand, we introduce three interdependent operations, namely weight partition, group-wise quantization and re-training. A well-proven measure is employed to divide the weights in each layer of a pre-trained CNN model into two disjoint groups. The weights in the first group are responsible to form a low-precision base, thus they are quantized by a variable-length encoding method. The weights in the other group are responsible to compensate for the accuracy loss from the quantization, thus they are the ones to be re-trained. On the other hand, these three operations are repeated on the latest re-trained group in an iterative manner until all the weights are converted into low-precision ones, acting as an incremental network quantization and accuracy enhancement procedure. Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed method. Specifically, at 5-bit quantization, our models have improved accuracy than the 32-bit floating-point references. Taking ResNet-18 as an example, we further show that our quantized models with 4-bit, 3-bit and 2-bit ternary weights have improved or very similar accuracy against its 32-bit floating-point baseline. Besides, impressive results with the combination of network pruning and INQ are also reported. The code is available at https://github.com/Zhouaojun/Incremental-Network-Quantization.