AITopics

1907.09949

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Telecommunications (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceJul-20-2019

Arena: a toolkit for Multi-Agent Reinforcement Learning

Wang, Qing, Xiong, Jiechao, Han, Lei, Fang, Meng, Sun, Xinghai, Zheng, Zhuobin, Sun, Peng, Zhang, Zhengyou

We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research. In MARL, it usually requires customizing observations, rewards and actions for each agent, changing cooperative-competitive agent-interaction, and playing with/against a third-party agent, etc. We provide a novel modular design, called Interface, for manipulating such routines in essentially two ways: 1) Different interfaces can be concatenated and combined, which extends the OpenAI Gym Wrappers concept to MARL scenarios. 2) During MARL training or testing, interfaces can be embedded in either wrapped OpenAI Gym compatible Environments or raw environment compatible Agents. We offer off-the-shelf interfaces for several popular MARL platforms, including StarCraft II, Pommerman, ViZDoom, Soccer, etc. The interfaces effectively support self-play RL and cooperative-competitive hybrid MARL. Also, Arena can be conveniently extended to your own favorite MARL platform.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1907.09467

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Lee, Taehee, Lisiecki, Lorraine E., Rand, Devin, Gebbie, Geoffrey, Lawrence, Charles E.

Dual Proxy Gaussian Process Stack: Integrating Benthic ${\delta}^{18}{\rm{O}}$ and Radiocarbon Proxies for Inferring Ages on Ocean Sediment Cores

arXiv.org Machine LearningJul-19-2019

Ages in ocean sediment cores are often inferred using either benthic ${\delta}^{18}{\rm{O}}$ or planktonic ${}^{14}{\rm{C}}$ of foraminiferal calcite. Existing probabilistic dating methods infer ages in two distinct approaches: ages are either inferred directly using radionuclides, e.g. Bacon [Blaauw and Christen (2011)]; or indirectly based on the alignment of records, e.g. HMM-Match [Lin et al. (2014)]. In this paper, we introduce a novel algorithm for integrating these two approaches by constructing Dual Proxy Gaussian Process (DPGP) stacks, which represent a probabilistic model of benthic ${\delta}^{18}{\rm{O}}$ change (and its timing) based on a set of cores. While a previous stack construction algorithm, HMM-Match, uses a discrete age inference model based on Hidden Markov models (HMMs) [Durbin et al. (1998)] and requires a number of records enough to sufficiently cover all its ages, DPGP stacks with time-varying variances are constructed with continuous ages obtained by particle smoothing [Doucet et al. (2001); Klaas et al. (2006)] and Markov-chain Monte Carlo (MCMC) [Peters (2008)] algorithms, and can be derived from a small number of records by applying the Gaussian process regression [Rasmussen and Williams (2005)]. As an example of the stacking method, we construct a local stack from 6 cores in the deep northeastern Atlantic Ocean and compare it to a deterministically constructed ${\delta}^{18}{\rm{O}}$ stack of 58 cores from the deep North Atlantic [Lisiecki and Stern (2016)]. We also provide two examples of how dual proxy alignment ages can be inferred by aligning additional cores to the stack.

algorithm, artificial intelligence, machine learning, (17 more...)

1907.08738

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Hassaine, Abdelaali, Canoy, Dexter, Solares, Jose Roberto Ayala, Zhu, Yajie, Rao, Shishir, Li, Yikuan, Rahimi, Kazem, Salimi-Khorshidi, Gholamreza

Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

arXiv.org Machine LearningJul-19-2019

Multimorbidity, or the presence of several medical conditions in the same individual, have been increasing in the population both in absolute and relative terms. However, multimorbidity remains poorly understood, and the evidence from existing research to describe its burden, determinants and consequences have been limited. Many of these studies are often cross-sectional and do not explicitly account for multimorbidity patterns' evolution over time. Some studies were based on small datasets, used arbitrary or narrow age range, or lacked appropriate clinical validations. In this study, we applied Non-negative Matrix Factorisation (NMF) in a novel way to one of the largest electronic health records (EHR) databases in the world (with 4 million patients), for simultaneously modelling disease clusters and their role in one's multimorbidity over time. Furthermore, we demonstrated how the temporal characteristics that our model associates with each disease cluster can help mine disease trajectories/networks and generate new hypotheses for the formation of multimorbidity clusters as a function of time/ageing. Our results suggest that our method's ability to learn the underlying dynamics of diseases can provide the field with a novel data-driven / exploratory way of learning the patterns of multimorbidity and their interactions over time.

artificial intelligence, disorder, machine learning, (14 more...)

1907.08577

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningJul-19-2019

Delegative Reinforcement Learning: learning to avoid traps with a little help

Kosoy, Vanessa

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1907.08461

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Nalivajevs, Olegs, Karapetyan, Daniel

Conditional Markov Chain Search for the Generalised Travelling Salesman Problem for Warehouse Order Picking

arXiv.org Artificial IntelligenceJul-19-2019

The Generalised Travelling Salesman Problem (GTSP) is a well-known problem that, among other applications, arises in warehouse order picking, where each stock is distributed between several locations -- a typical approach in large modern warehouses. However, the instances commonly used in the literature have a completely different structure, and the methods are designed with those instances in mind. In this paper, we give a new pseudo-random instance generator that reflects the warehouse order picking and publish new benchmark testbeds. We also use the Conditional Markov Chain Search framework to automatically generate new GTSP metaheuristics trained specifically for warehouse order picking. Finally, we report the computational results of our metaheuristics to enable further competition between solvers.

artificial intelligence, generalised travelling salesman problem, machine learning, (2 more...)

1907.08647

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)

arXiv.org Artificial IntelligenceJul-19-2019

Empowering A* Search Algorithms with Neural Networks for Personalized Route Recommendation

Wang, Jingyuan, Wu, Ning, Zhao, Wayne Xin, Peng, Fanzhang, Lin, Xin

Personalized Route Recommendation (PRR) aims to generate user-specific route suggestions in response to users' route queries. Early studies cast the PRR task as a pathfinding problem on graphs, and adopt adapted search algorithms by integrating heuristic strategies. Although these methods are effective to some extent, they require setting the cost functions with heuristics. In addition, it is difficult to utilize useful context information in the search procedure. To address these issues, we propose using neural networks to automatically learn the cost functions of a classic heuristic algorithm, namely A* algorithm, for the PRR task. Our model consists of two components. First, we employ attention-based Recurrent Neural Networks (RNN) to model the cost from the source to the candidate location by incorporating useful context information. Instead of learning a single cost value, the RNN component is able to learn a time-varying vectorized representation for the moving state of a user. Second, we propose to use a value network for estimating the cost from a candidate location to the destination. For capturing structural characteristics, the value network is built on top of improved graph attention networks by incorporating the moving state of a user and other context information. The two components are integrated in a principled way for deriving a more accurate cost of a candidate location. Extensive experiment results on three real-world datasets have shown the effectiveness and robustness of the proposed model.

artificial intelligence, information, machine learning, (17 more...)

1907.08489

Country:

Asia (0.29)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.67)
Transportation > Ground > Road (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Belousov, Boris, Peters, Jan

Entropic Regularization of Markov Decision Processes

arXiv.org Machine LearningJul-18-2019

An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment. Such interactive data gathering commonly leads to divergence towards dangerous or uninformative regions of the state space unless additional regularization measures are taken. Prior works proposed bounding the information loss measured by the Kullback-Leibler (KL) divergence at every policy improvement step to eliminate instability in the learning dynamics. In this paper, we consider a broader family of $f$-divergences, and more concretely $\alpha$-divergences, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Such entropic proximal policy optimization view gives a unified perspective on compatible actor-critic architectures. In particular, common least-squares value function estimation coupled with advantage-weighted maximum likelihood policy improvement is shown to correspond to the Pearson $\chi^2$-divergence penalty. Other actor-critic pairs arise for various choices of the penalty-generating function $f$. On a concrete instantiation of our framework with the $\alpha$-divergence, we carry out asymptotic analysis of the solutions for different values of $\alpha$ and demonstrate the effects of the divergence function choice on common standard reinforcement learning problems.

artificial intelligence, machine learning, optimization problem, (17 more...)

doi: 10.3390/e21070674

1907.04214

Country:

North America > United States (0.93)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Rajendran, Janarthanan, Ganhotra, Jatin, Polymenakos, Lazaros

Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use

arXiv.org Artificial IntelligenceJul-17-2019

Neural end-to-end goal-oriented dialog systems showed promise to reduce the workload of human agents for customer service, as well as reduce wait time for users. However, their inability to handle new user behavior at deployment has limited their usage in real world. In this work, we propose an end-to-end trainable method for neural goal-oriented dialog systems which handles new user behaviors at deployment by transferring the dialog to a human agent intelligently. The proposed method has three goals: 1) maximize user's task success by transferring to human agents, 2) minimize the load on the human agents by transferring to them only when it is essential and 3) learn online from the human agent's responses to reduce human agents load further. We evaluate our proposed method on a modified-bAbI dialog task that simulates the scenario of new user behaviors occurring at test time. Experimental results show that our proposed method is effective in achieving the desired goals.

deployment, dialog task, user behavior, (15 more...)

doi: 10.1162/tacl_a_00274

1907.07638

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningJul-16-2019

Adversarial Security Attacks and Perturbations on Machine Learning and Deep Learning Methods

Siddiqi, Arif

Cybersecurity also benefits from ML and DL methods for various types of applications. These methods however are susceptible to security attacks. The adversaries can exploit the training and testing data of the learning models or can explore the workings of those models for launching advanced future attacks. The topic of adversarial security attacks and perturbations within the ML and DL domains is a recent exploration and a great interest is expressed by the security researchers and practitioners. The literature covers different adversarial security attacks and perturbations on ML and DL methods and those have their own presentation styles and merits. A need to review and consolidate knowledge that is comprehending of this increasingly focused and growing topic of research; however, is the current demand of the research communities. In this review paper, we specifically aim to target new researchers in the cybersecurity domain who may seek to acquire some basic knowledge on the machine learning and deep learning models and algorithms, as well as some of the relevant adversarial security attacks and perturbations.

artificial intelligence, machine learning, southern new hampshire university, (11 more...)

1907.07291

Country:

North America > United States > California (1.00)
Europe (1.00)
Asia (0.92)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.68)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)