AITopics

2110.13998

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Kansas > Douglas County > Lawrence (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceOct-26-2021

Iterative Teacher-Aware Learning

Yuan, Luyao, Zhou, Dongruo, Shen, Junhong, Gao, Jingdong, Chen, Jeffrey L., Gu, Quanquan, Wu, Ying Nian, Zhu, Song-Chun

In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. The teacher adjusts her teaching method for different students, and the student, after getting familiar with the teacher's instruction mechanism, can infer the teacher's intention to learn faster. Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works. However, how cooperative pedagogy can facilitate machine parameter learning hasn't been thoroughly studied. In this paper, we propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function and learn provably faster compared with the naive learning algorithms used in previous machine teaching works. We give theoretical proof that the iterative teacher-aware learning (ITAL) process leads to local and global improvements. We then validate our algorithms with extensive experiments on various tasks including regression, classification, and inverse reinforcement learning using synthetic and real data. We also show the advantage of modeling teacher-awareness when agents are learning from human teachers.

experiment, learner, teacher-aware learner, (12 more...)

2110.00137

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > China > Beijing > Beijing (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(4 more...)

Campbell, Andrew, Shi, Yuyang, Rainforth, Tom, Doucet, Arnaud

Online Variational Filtering and Parameter Learning

arXiv.org Machine LearningOct-26-2021

We present a variational method for online state estimation and parameter learning in state-space models (SSMs), a ubiquitous class of latent variable models for sequential data. As per standard batch variational techniques, we use stochastic gradients to simultaneously optimize a lower bound on the log evidence with respect to both model parameters and a variational approximation of the states' posterior distribution. However, unlike existing approaches, our method is able to operate in an entirely online manner, such that historic observations do not require revisitation after being incorporated and the cost of updates at each time step remains constant, despite the growing dimensionality of the joint posterior distribution of the states. This is achieved by utilizing backward decompositions of this joint posterior distribution and of its variational approximation, combined with Bellman-type recursions for the evidence lower bound and its gradients. We demonstrate the performance of this methodology across several examples, including high-dimensional SSMs and sequential Variational Auto-Encoders.

approximation, recursion, time step, (15 more...)

arXiv.org Machine Learning

2110.13549

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceOct-25-2021, 22:45:21 GMT

Introduction to Markov Chains

Observe how in the example, the probability distribution is obtained solely by observing transitions from the current day to the next. This illustrates the Markov property, the unique characteristic of Markov processes that renders them memoryless. This typically leaves them unable to successfully produce sequences in which some underlying trend would be expected to occur. For example, while a Markov chain may be able to mimic the writing style of an author based on word frequencies, it would be unable to produce text that contains deep meaning or thematic significance since these are developed over much longer sequences of text. They therefore lack the ability to produce context-dependent content since they cannot take into account the full chain of prior states.

markov chain, sequence

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

#artificialintelligenceOct-25-2021, 08:26:58 GMT

Machine Learning A-Z : Hands-On Python & R In Data Science

Free Coupon Discount - Machine Learning A-Z: Hands-On Python & R In Data Science, Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included Created by Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team, SuperDataScience Support Students also bought Advanced AI: Deep Reinforcement Learning in Python Deep Learning: Convolutional Neural Networks in Python Deep Learning: Recurrent Neural Networks in Python Unsupervised Machine Learning Hidden Markov Models in Python Bayesian Machine Learning in Python: A/B Testing Preview this Udemy Course GET COUPON CODE Description Interested in the field of Machine Learning? Then this course is for you! This course has been designed by two professional Data Scientists so that we can share our knowledge and help you learn complex theory, algorithms and coding libraries in a simple way. We will walk you step-by-step into the World of Machine Learning.

learning, machine learning, neural network, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Yang, Zhihan, Nguyen, Hai

Recurrent Off-policy Baselines for Memory-based Continuous Control

When the environment is partially observable (PO), a deep reinforcement learning (RL) agent must learn a suitable temporal representation of the entire history in addition to a strategy to control. This problem is not novel, and there have been model-free and model-based algorithms proposed for this problem. However, inspired by recent success in model-free image-based RL, we noticed the absence of a model-free baseline for history-based RL that (1) uses full history and (2) incorporates recent advances in off-policy continuous control. Therefore, we implement recurrent versions of DDPG, TD3, and SAC (RDPG, RTD3, and RSAC) in this work, evaluate them on short-term and long-term PO domains, and investigate key design choices. Our experiments show that RDPG and RTD3 can surprisingly fail on some domains and that RSAC is the most reliable, reaching near-optimal performance on nearly all domains. However, one task that requires systematic exploration still proved to be difficult, even for RSAC. These results show that model-free RL can learn good temporal representation using only reward signals; the primary difficulty seems to be computational cost and exploration.

machine learning, reinforcement learning, rsac, (18 more...)

2110.12628

Country:

North America > United States > Minnesota > Rice County > Northfield (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

Hendrycks, Dan, Mazeika, Mantas, Zou, Andy, Patel, Sahil, Zhu, Christine, Navarro, Jesus, Song, Dawn, Li, Bo, Steinhardt, Jacob

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong. By contrast, artificial agents are not currently endowed with a moral sense. As a consequence, they may unknowingly act immorally, especially when trained on environments that disregard moral concerns such as violent video games. With the advent of generally capable agents that pretrain on many environments, it will become necessary to mitigate inherited biases from such environments that teach immoral behavior. To facilitate the development of agents that avoid causing wanton harm, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of diverse, morally salient scenarios. By annotating every possible game state, the Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. Using models with commonsense moral knowledge, we create an elementary artificial conscience that assesses and guides agents. In extensive experiments, we find that the artificial conscience approach can steer agents towards moral behavior without sacrificing performance.

agent, annotation, jiminy cricket, (16 more...)

2110.13136

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New Mexico (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.63)

Industry:

Law (1.00)
Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Learning What to Memorize: Using Intrinsic Motivation to Form Useful Memory in Partially Observable Reinforcement Learning

Demir, Alper

Reinforcement Learning faces an important challenge in partial observable environments that has long-term dependencies. In order to learn in an ambiguous environment, an agent has to keep previous perceptions in a memory. Earlier memory based approaches use a fixed method to determine what to keep in the memory, which limits them to certain problems. In this study, we follow the idea of giving the control of the memory to the agent by allowing it to have memory-changing actions. This learning mechanism is supported by an intrinsic motivation to memorize rare observations that can help the agent to disambiguate its state in the environment. Our approach is experimented and analyzed on several partial observable tasks with long-term dependencies and compared with other memory based methods.

agent, intrinsic motivation, sarsa, (13 more...)

2110.1281

Country:

Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Lexicographic Optimisation of Conditional Value at Risk and Expected Value for Risk-Averse Planning in MDPs

Rigter, Marc, Duckworth, Paul, Lacerda, Bruno, Hawes, Nick

Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. We formulate the lexicographic optimisation problem of minimising the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on three domains, including a road navigation domain based on real traffic data. Our experimental results demonstrate that our lexicographic approach attains improved expected cost while maintaining the optimal CVaR.

cvar, history, probability, (16 more...)

2110.12746

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (0.70)

Industry: Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Golowich, Noah, Moitra, Ankur

Can Q-Learning be Improved with Advice?

arXiv.org Machine LearningOct-25-2021

Despite rapid progress in theoretical reinforcement learning (RL) over the last few years, most of the known guarantees are worst-case in nature, failing to take advantage of structure that may be known a priori about a given RL problem at hand. In this paper we address the question of whether worst-case lower bounds for regret in online learning of Markov decision processes (MDPs) can be circumvented when information about the MDP, in the form of predictions about its optimal $Q$-value function, is given to the algorithm. We show that when the predictions about the optimal $Q$-value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate. This improvement holds for both uniform regret bounds and gap-based ones. Further, we are able to achieve this property with an algorithm that achieves sublinear regret when given arbitrary predictions (i.e., even those which are not a distillation). Our work extends a recent line of work on algorithms with predictions, which has typically focused on simple online problems such as caching and scheduling, to the more complex and general problem of reinforcement learning.

distillation, lemma 6, qlearningpred, (14 more...)

arXiv.org Machine Learning

2110.13052

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.68)
Research Report > New Finding (0.45)

Industry:

Health & Medicine (0.92)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)