AITopics

2201.05673

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Aragón > Zaragoza Province > Zaragoza (0.04)
Europe > Austria > Vienna (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Bertran, Martin, Talbott, Walter, Srivastava, Nitish, Susskind, Joshua

Efficient Embedding of Semantic Similarity in Control Policies via Entangled Bisimulation

arXiv.org Machine LearningJan-28-2022

Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning. Recently, there has been renewed interest in bisimulation metrics as a tool to address this issue; these metrics can be used to learn representations that are, in principle, invariant to irrelevant distractions by measuring behavioural similarity between states. An accurate, unbiased, and scalable estimation of these metrics has proved elusive in continuous state and action scenarios. We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states, and can be estimated without bias in continuous state and action spaces. We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS), even when added on top of data augmentation techniques.

bisimulation, efficient embedding, semantic similarity, (12 more...)

2201.123

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.42)

Kajino, Hiroshi, Miyaguchi, Kohei, Osogami, Takayuki

Biases in In Silico Evaluation of Molecular Optimization Methods and Bias-Reduced Evaluation Methodology

arXiv.org Machine LearningJan-28-2022

Molecular optimization aims to discover novel molecules with improved properties, which is often formulated as a reinforcement learning problem by modeling the construction of a molecule using a Markov decision process. The performance of such agents is measured by the quality of generated molecules. In the community of machine learning, most of the molecular optimization methods have been verified in silico, i.e., in computer simulation. Since most of the generated molecules are novel, their properties are unknown and we have to resort to a predictor to estimate the properties. However, little attention has been paid to how reliable such estimates are, which makes the existing performance estimates less reliable. In this paper, we study the statistical performance of such performance estimators to enhance our understanding of the evaluation protocol and we discuss several directions to improve it. Let us first introduce a common practice to estimate the performance in silico.

estimator, molecule, performance estimator, (14 more...)

2201.12163

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceJan-27-2022

Online Planning in POMDPs with Self-Improving Simulators

He, Jinke, Suau, Miguel, Baier, Hendrik, Kaisers, Michael, Oliehoek, Frans A.

How can we plan efficiently in a large and complex environment when the time budget is limited? However, there are three main limitations of this "twophase" Given the original simulator of the environment, paradigm, where a simulator is learned offline and which may be computationally very demanding, we then used as-is for online simulation and planning. First, no propose to learn online an approximate but much planning is possible until the offline learning phase finishes, faster simulator that improves over time. To plan which can take a long time. Second, the separation of learning reliably and efficiently while the approximate simulator and planning raises a question on what data collection policy is learning, we develop a method that adaptively should be used during training to ensure good online prediction decides which simulator to use for every simulation, during planning. We empirically demonstrate that when based on a statistic that measures the accuracy the training data is collected by a uniform random policy, the of the approximate simulator. This allows us to learned influence predictors can perform poorly during online use the approximate simulator to replace the original planning, due to distribution shift. Third, completely replacing simulator for faster simulations when it is accurate the original simulator with the approximate one after enough under the current context, thus trading training implies a risk of poor planning performance in certain off simulation speed and accuracy. Experimental situations, which is hard to detect in advance.

influence predictor, simulation, simulator, (14 more...)

2201.11404

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.83)

arXiv.org Machine LearningJan-27-2022

Generalization Error Bounds on Deep Learning with Markov Datasets

Truong, Lan V.

In this paper, we derive upper bounds on generalization errors for deep neural networks with Markov datasets. These bounds are developed based on Koltchinskii and Panchenko's approach for bounding the generalization error of combined classifiers with i.i.d. datasets. The development of new symmetrization inequalities in high-dimensional probability for Markov chains is a key element in our extension, where the pseudo-spectral gap of the infinitesimal generator of the Markov chain plays as a key parameter in these inequalities. We also propose a simple method to convert these bounds and other similar ones in traditional deep learning and machine learning to Bayesian counterparts for both i.i.d. and Markov datasets. Extensions to $m$-order homogeneous Markov chains such as AR and ARMA models and mixtures of several Markov data services are given, where the spectral method in functional analysis is used to derive these results.

generalization error, loglog 2 2, markov chain, (15 more...)

2201.11059

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Tse, Hon Tik, Leung, Ho-fung

Exploiting Semantic Epsilon Greedy Exploration Strategy in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceJan-26-2022

Multi-agent reinforcement learning (MARL) can model many real world applications. However, many MARL approaches rely on epsilon greedy for exploration, which may discourage visiting advantageous states in hard scenarios. In this paper, we propose a new approach QMIX(SEG) for tackling MARL. It makes use of the value function factorization method QMIX to train per-agent policies and a novel Semantic Epsilon Greedy (SEG) exploration strategy. SEG is a simple extension to the conventional epsilon greedy exploration strategy, yet it is experimentally shown to greatly improve the performance of MARL. We first cluster actions into groups of actions with similar effects and then use the groups in a bi-level epsilon greedy exploration hierarchy for action selection. We argue that SEG facilitates semantic exploration by exploring in the space of groups of actions, which have richer semantic meanings than atomic actions. Experiments show that QMIX(SEG) largely outperforms QMIX and leads to strong performance competitive with current state-of-the-art MARL approaches on the StarCraft Multi-Agent Challenge (SMAC) benchmark.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2201.10803

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wagenmaker, Andrew, Chen, Yifang, Simchowitz, Max, Du, Simon S., Jamieson, Kevin

Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes

arXiv.org Machine LearningJan-26-2022

Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than PAC RL -- where the agent has access to the reward function during exploration -- with optimal sample complexities in the two settings differing by a factor of $|\mathcal{S}|$, the size of the state space. We show that this separation does not exist in the setting of linear MDPs. We first develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP with sample complexity scaling as $\mathcal{O}(d^2/\epsilon^2)$. We then show a matching lower bound of $\Omega(d^2/\epsilon^2)$ on PAC RL. To our knowledge, our approach is the first computationally efficient algorithm to achieve optimal $d$ dependence in linear MDPs, even in the single-reward PAC setting. Our algorithm relies on a novel procedure which efficiently traverses a linear MDP, collecting samples in any given "feature direction", and enjoys a sample complexity scaling optimally in the (linear MDP equivalent of the) maximal state visitation probability. We show that this exploration procedure can also be applied to solve the problem of obtaining "well-conditioned" covariates in linear MDPs.

algorithm, linear mdp, reward function, (12 more...)

2201.11206

Country:

North America > United States > Washington > King County > Seattle (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Nesterov, Alexander, Umerenkov, Dmitry

Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality

arXiv.org Artificial IntelligenceJan-25-2022

Medical entity extraction (EE) is a standard procedure used as a first stage in medical texts processing. Usually Medical EE is a two-step process: named entity recognition (NER) and named entity normalization (NEN). We propose a novel method of doing medical EE from electronic health records (EHR) as a single-step multi-label classification task by fine-tuning a transformer model pretrained on a large EHR dataset. Our model is trained end-to-end in an distantly supervised manner using targets automatically extracted from medical knowledge base. We show that our model learns to generalize for entities that are present frequently enough, achieving human-level classification quality for most frequent entities. Our work demonstrates that medical entity extraction can be done end-to-end without human supervision and with human quality given the availability of a large enough amount of unlabeled EHR and a medical knowledge base.

entity extraction, extraction, medical entity extraction, (12 more...)

2201.10463

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

arXiv.org Artificial IntelligenceJan-25-2022

Probability estimation and structured output prediction for learning preferences in last mile delivery

Canoy, Rocsildes, Bucarey, Victor, Molenbruch, Yves, Mulamba, Maxime, Mandi, Jayanta, Guns, Tias

We study the problem of learning the preferences of drivers and planners in the context of last mile delivery. Given a data set containing historical decisions and delivery locations, the goal is to capture the implicit preferences of the decision-makers. We consider two ways to use the historical data: one is through a probability estimation method that learns transition probabilities between stops (or zones). This is a fast and accurate method, recently studied in a VRP setting. Furthermore, we explore the use of machine learning to infer how to best balance multiple objectives such as distance, probability and penalties. Specifically, we cast the learning problem as a structured output prediction problem, where training is done by repeatedly calling the TSP solver. Another important aspect we consider is that for last-mile delivery, every address is a potential client and hence the data is very sparse. Hence, we propose a two-stage approach that first learns preferences at the zone level in order to compute a zone routing; after which a penalty-based TSP computes the stop routing. Results show that the zone transition probability estimation performs well, and that the structured output prediction learning can improve the results further. We hence showcase a successful combination of both probability estimation and machine learning, all the while using standard TSP solvers, both during learning and to compute the final solution; this means the methodology is applicable to other, real-life, TSP variants, or proprietary solvers.

matrix, probability, transition probability, (12 more...)

2201.10269

Country:

South America > Chile > O'Higgins Region > Cachapoal Province > Rancagua (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation (0.47)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.47)
(3 more...)

arXiv.org Artificial IntelligenceJan-24-2022

Safety-Aware Multi-Agent Apprenticeship Learning

Zhao, Junchen

As the rapid development of Artifical Intelligence in the current technology field, Reinforcement Learning has been proven as a powerful technique that allows autonomous agents to learn optimal behaviors (called policies) in unknown and complex environments through models of rewards and penalization. However, in order to make this technique (Reinforcement Learning) work correctly and get the precise reward function, which returns the feedback to the learning agent about when the agent behaves correctly or not, the reward function needs to be thoroughly specified. As a result, in real-world complex environments, such as autonomous driving, specifying a correct reward function could be one of the hard tasks to tackle for the Reinforcement Learning model designers. To this end, Apprenticeship Learning techniques, in which the agent can infer a reward function from expert behaviors, are of high interest due to the fact that they could result in highly specified reward function efficiently. However, for critical tasks such as autonomous driving, we need to critically consider about the safety-related issues, so as to we need to build techniques to automatically check and ensure that the inferred rewards functions and policies resulted from the Reinforcement Learning model fulfill the needed safety requirements of the critical tasks that we have mentioned previously. In order to have a well-designed Reinforcement Learning model, which is able to generate the highly-specified reward function satisfying the safety-related considerations, the technique called "Safety-Aware Apprenticeship Learning" was built in 2018[23], which would be introduced in detail in the later sections. Although the technique "Safety-Aware Apprenticeship Learning" has been built, it only considers Single-Agent scenario. In the other word, the current "Safety-Aware Apprenticeship Learning" technique can only be applied to one agent running in an isolated environment, a fact which limits the potential implementation of this technique.

agent, apprenticeship learning, learning, (14 more...)

2201.08111

Genre: Research Report (0.50)

Industry:

Education (0.94)
Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)