elim
- Europe > Austria (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > Wales > Cardiff (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.35)
- Europe > Austria (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > Wales > Cardiff (0.04)
- (2 more...)
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
Tirinzoni, Andrea, Papini, Matteo, Touati, Ahmed, Lazaric, Alessandro, Pirotta, Matteo
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BanditSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BanditSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
$\sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank
Dong, Kefan, Peng, Jian, Wang, Yining, Zhou, Yuan
In this paper, we consider the problem of online learning of Markov decision processes (MDPs) with very large state spaces. Under the assumptions of realizable function approximation and low Bellman ranks, we develop an online learning algorithm that learns the optimal value function while at the same time achieving very low cumulative regret during the learning process. Our learning algorithm, Adaptive Value-function Elimination (AVE), is inspired by the policy elimination algorithm proposed in (Jiang et al., 2017), known as OLIVE. One of our key technical contributions in AVE is to formulate the elimination steps in OLIVE as contextual bandit problems. This technique enables us to apply the active elimination and expert weighting methods from (Dudik et al., 2011), instead of the random action exploration scheme used in the original OLIVE algorithm, for more efficient exploration and better control of the regret incurred in each policy elimination step. To the best of our knowledge, this is the first $\sqrt{n}$-regret result for reinforcement learning in stochastic MDPs with general value function approximation.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- Europe > Sweden > Skåne County > Malmö (0.04)
- (3 more...)
- Education > Educational Setting > Online (0.54)
- Leisure & Entertainment > Games (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Argumentative Approaches to Reasoning with Maximal Consistency
Arieli, Ofer (The Academic College of Tel-Aviv) | Strasser, Christian (Ruhr University Bochum)
Reasoning with the maximally consistent subsets (MCS) of the premises is awell-known approach for handling contradictory information. We introduce two argumentation-based methods for doing so: a declarative approach that is related to Dung-style semantics for abstract argumentation, and a computational approach that is based on extensions of Gentzen-type proofs systems. This brings about a new perspective on reasoning with MCS which shows a strong link between the latter and argumentation systems, and which can be extended to related formalisms. A by-product of this is the introduction of a dynamic proof system for classical logic and rebuttal attacks, which is sound and complete with respect to Dung's stable semantics for the associated argumentation framework.
- Europe > Germany (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Modeling Diffusion in Social Networks Using Network Properties
Luu, Duc Minh (Singapore Management University) | Lim, Ee-Peng (Singapore Management University) | Hoang, Tuan-Anh (Singapore Management University) | Chua, Freddy Chong Tat (Singapore Management University)
Diffusion of items occurs in social networks due to spreading of items through word of mouth and exogenous factors. These items may be news, products, videos, advertisements or contagious viruses. Previous research has studied diffusion process at both the macro and micro levels. The former models the number of item adopters in the diffusion process while the latter determines which individuals adopt item. In this paper, we establish a general probabilistic framework, which can be used to derive macro-level diffusion models, including the well known Bass Model (BM). Using this framework, we develop several other models considering the social network’s degree distribution coupled with the assumption of linear influence by neighboring adopters in the diffusion process. Through some evaluation on synthetic data, this paper shows that degree distribution actually changes during the diffusion process. We therefore introduce a multi-stage diffusion model to cope with variable degree distribution. By conducting experiments on both synthetic and real datasets, we show that our proposed diffusion models can recover the diffusion parameters from the observed diffusion data, which allows us to model diffusion with high accuracy.
- Asia > Singapore (0.05)
- North America > United States > New York (0.04)