AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Estimating Disentangled Belief about Hidden State and Hidden Task for Meta-RL

Akuzawa, Kei, Iwasawa, Yusuke, Matsuo, Yutaka

arXiv.org Artificial IntelligenceMay-14-2021

There is considerable interest in designing meta-reinforcement learning (meta-RL) algorithms, which enable autonomous agents to adapt new tasks from small amount of experience. In meta-RL, the specification (such as reward function) of current task is hidden from the agent. In addition, states are hidden within each task owing to sensor noise or limitations in realistic environments. Therefore, the meta-RL agent faces the challenge of specifying both the hidden task and states based on small amount of experience. To address this, we propose estimating disentangled belief about task and states, leveraging an inductive bias that the task and states can be regarded as global and local features of each task. Specifically, we train a hierarchical state-space model (HSSM) parameterized by deep neural networks as an environment model, whose global and local latent variables correspond to task and states, respectively. Because the HSSM does not allow analytical computation of posterior distribution, i.e., belief, we employ amortized inference to approximate it. After the belief is obtained, we can augment observations of a model-free policy with the belief to efficiently train the policy. Moreover, because task and state information are factorized and interpretable, the downstream policy training is facilitated compared with the prior methods that did not consider the hierarchical nature. Empirical validations on a GridWorld environment confirm that the HSSM can separate the hidden task and states information. Then, we compare the meta-RL agent with the HSSM to prior meta-RL methods in MuJoCo environments, and confirm that our agent requires less training data and reaches higher final performance.

information, international conference, task information, (15 more...)

arXiv.org Artificial Intelligence

2105.0666

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A cell type-specific cortico-subcortical brain circuit for investigatory and novelty-seeking behavior

ScienceMay-13-2021, 17:39:48 GMT

Curiosity is what drives organisms to investigate each other and their environment. It is considered by many to be as intrinsic as hunger and thirst, but the neurobiological mechanisms behind curiosity have remained elusive. In mice, Ahmadlou et al. found that a specific population of genetically identified γ-aminobutyric acid (GABA)—ergic neurons in a brain region called the zona incerta receive excitatory input in the form of novelty and/or arousal information from the prelimbic cortex, and these neurons send inhibitory projections to the periaqueductal gray region (see the Perspective by Farahbakhsh and Siciliano). This circuitry is necessary for the exploration of new objects and conspecifics. Science , this issue p. [eabe9681][1]; see also p. [684][2] ### INTRODUCTION Motivational drives are internal states that can be different even in similar interactions with external stimuli. Curiosity as the motivational drive for novelty-seeking and investigating the surrounding environment is for survival as essential and intrinsic as hunger. Curiosity, hunger, and appetitive aggression drive three different goal-directed behaviors—novelty seeking, food eating, and hunting—but these behaviors are composed of similar actions in animals. This similarity of actions has made it challenging to study novelty seeking and distinguish it from eating and hunting in nonarticulating animals. The brain mechanisms underlying this basic survival drive, curiosity, and novelty-seeking behavior have remained unclear. ### RATIONALE In spite of having well-developed techniques to study mouse brain circuits, there are many controversial and different results in the field of motivational behavior. This has left the functions of motivational brain regions such as the zona incerta (ZI) still uncertain. Not having a transparent, nonreinforced, and easily replicable paradigm is one of the main causes of this uncertainty. Therefore, we chose a simple solution to conduct our research: giving the mouse freedom to choose what it wants—double free-access choice. By examining mice in an experimental battery of object free-access double-choice (FADC) and social interaction tests—using optogenetics, chemogenetics, calcium fiber photometry, multichannel recording electrophysiology, and multicolor mRNA in situ hybridization—we uncovered a cell type–specific cortico-subcortical brain circuit of the curiosity and novelty-seeking behavior. ### RESULTS We analyzed the transitions within action sequences in object FADC and social interaction tests. Frequency and hidden Markov model analyses showed that mice choose different action sequences in interaction with novel objects and in early periods of interaction with novel conspecifics compared with interaction with familiar objects or later periods of interaction with conspecifics, which we categorized as deep and shallow investigation, respectively. This finding helped us to define a measure of depth of investigation that indicates how much a mouse prefers deep over shallow investigation and reflects the mouse’s motivational level to investigate, regardless of total duration of investigation. Optogenetic activation of inhibitory neurons in medial ZI (ZIm), ZImGAD2 neurons, showed a dramatic increase in positive arousal level, depth of investigation, and duration of interaction with conspecifics and novel objects compared with familiar objects, crickets, and food. Optogenetic or chemogenetic deactivation of these neurons decreased depth and duration of investigation. Moreover, we found that ZImGAD2 neurons are more active during deep investigation as compared with during shallow investigation. We found that activation of prelimbic cortex (PL) axons into ZIm increases arousal level, and chemogenetic deactivation of these axons decreases the duration and depth of investigation. Calcium fiber photometry of these axons showed no difference in activity between shallow and deep investigation, suggesting a nonspecific motivation. Optogenetic activation of ZImGAD2 axons into lateral periaqueductal gray (lPAG) increases the arousal level, whereas chemogenetic deactivation of these axons decreases duration and depth of investigation. Calcium fiber photometry of these axons showed high activity during deep investigation and no significant activity during shallow investigation, suggesting a thresholding mechanism. Last, we found a new subpopulation of inhibitory neurons in ZIm expressing tachykinin 1 (TAC1) that monosynaptically receive PL inputs and project to lPAG. Optogenetic activation and deactivation of these neurons, respectively, increased and decreased depth and duration of investigation. ### CONCLUSION Our experiments revealed different action sequences based on the motivational level of novelty seeking. Moreover, we uncovered a new brain circuit underlying curiosity and novelty-seeking behavior, connecting excitatory neurons of PL to lPAG through TAC1+ inhibitory neurons of ZIm. ![Figure][3] Brain mechanism of curiosity. ( A ) How we mapped motivational level to action sequences. ( B ) Experimental battery to distinguish novelty-seeking behavior from food eating and hunting in mice with photoactivation of ZImGAD2 neurons. ( C ) Schematic of calcium activity in PL→ZIm, ZIm, and ZIm→PAG during shallow and deep investigation. ( D ) TAC1+ neurons as a subpopulation of ZImGAD2 neurons receive input from PL and project to PAG. HMM, hidden Markov model. Exploring the physical and social environment is essential for understanding the surrounding world. We do not know how novelty-seeking motivation initiates the complex sequence of actions that make up investigatory behavior. We found in mice that inhibitory neurons in the medial zona incerta (ZIm), a subthalamic brain region, are essential for the decision to investigate an object or a conspecific. These neurons receive excitatory input from the prelimbic cortex to signal the initiation of exploration. This signal is modulated in the ZIm by the level of investigatory motivation. Increased activity in the ZIm instigates deep investigative action by inhibiting the periaqueductal gray region. A subpopulation of inhibitory ZIm neurons expressing tachykinin 1 (TAC1) modulates the investigatory behavior. [1]: /lookup/doi/10.1126/science.abe9681 [2]: /lookup/doi/10.1126/science.abi7270 [3]: pending:yes

artificial intelligence, investigation, machine learning, (13 more...)

Science

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (1.00)

Add feedback

Reinforcement Learning Based Safe Decision Making for Highway Autonomous Driving

Mohammadhasani, Arash, Mehrivash, Hamed, Lynch, Alan, Shu, Zhan

arXiv.org Artificial IntelligenceMay-13-2021

In this paper, we develop a safe decision-making method for self-driving cars in a multi-lane, single-agent setting. The proposed approach utilizes deep reinforcement learning (RL) to achieve a high-level policy for safe tactical decision-making. We address two major challenges that arise solely in autonomous navigation. First, the proposed algorithm ensures that collisions never happen, and therefore accelerate the learning process. Second, the proposed algorithm takes into account the unobservable states in the environment. These states appear mainly due to the unpredictable behavior of other agents, such as cars, and pedestrians, and make the Markov Decision Process (MDP) problematic when dealing with autonomous navigation. Simulations from a well-known self-driving car simulator demonstrate the applicability of the proposed method

agent, autonomous navigation, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2105.06517

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > Austria > Vienna (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Identity testing of reversible Markov chains

Fried, Sela, Wolfer, Geoffrey

arXiv.org Machine LearningMay-13-2021

We consider the problem of identity testing of Markov chains based on a single trajectory of observations under the distance notion introduced by Daskalakis et al. [2018a] and further analyzed by Cherapanamjeri and Bartlett [2019]. Both works made the restrictive assumption that the Markov chains under consideration are symmetric. In this work we relax the symmetry assumption to the more natural assumption of reversibility, still assuming that both the reference and the unknown Markov chains share the same stationary distribution.

lemma 4, markov chain, probability, (16 more...)

arXiv.org Machine Learning

2105.06347

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States (0.04)
Asia > India (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Popov, Vadim, Vovk, Ivan, Gogoryan, Vladimir, Sadekova, Tasnima, Kudinov, Mikhail

arXiv.org Machine LearningMay-13-2021

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by gradually transforming noise predicted by encoder and aligned with text input by means of Monotonic Alignment Search. The framework of stochastic differential equations helps us to generalize conventional diffusion probabilistic models to the case of reconstructing data from noise with different parameters and allows to make this reconstruction flexible by explicitly controlling trade-off between sound quality and inference speed. Subjective human evaluation shows that Grad-TTS is competitive with state-of-the-art text-to-speech approaches in terms of Mean Opinion Score. We will make the code publicly available shortly.

diffusion probabilistic model, grad-tts, reverse diffusion, (13 more...)

arXiv.org Machine Learning

2105.06337

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.91)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.90)
(2 more...)

Add feedback

Recent Advances in Deep Learning-based Dialogue Systems

Ni, Jinjie, Young, Tom, Pandelea, Vlad, Xue, Fuzhao, Adiga, Vinay, Cambria, Erik

arXiv.org Artificial IntelligenceMay-13-2021

Dialogue systems are a popular Natural Language Processing (NLP) task as it is promising in real-life applications. It is also a complicated task since many NLP tasks deserving study are involved. As a result, a multitude of novel works on this task are carried out, and most of them are deep learning-based due to the outstanding performance. In this survey, we mainly focus on the deep learning-based dialogue systems. We comprehensively review state-of-the-art research outcomes in dialogue systems and analyze them from two angles: model type and system type. Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems. This will help researchers acquaint these models and see how they are applied in state-of-the-art frameworks, which is rather helpful when designing a new dialogue system. From the angle of system type, we discuss task-oriented and open-domain dialogue systems as two streams of research, providing insight into the hot topics related. Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research. Finally, some possible research trends are identified based on the recent research outcomes. To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present in the area of dialogue systems and dialogue-related tasks, extensively covering the popular frameworks, topics, and datasets. Keywords: Dialogue Systems, Chatbots, Conversational AI, Task-oriented, Open Domain, Chit-chat, Question Answering, Artificial Intelligence, Natural Language Processing, Information Retrieval, Deep Learning, Neural Networks, CNN, RNN, Hierarchical Recurrent Encoder-Decoder, Memory Networks, Attention, Transformer, Pointer Net, CopyNet, Reinforcement Learning, GANs, Knowledge Graph, Survey, Review

aaai conference, artificial intelligence, dialogue generation, (15 more...)

arXiv.org Artificial Intelligence

2105.04387

Country:

Asia > Middle East > Jordan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Hong Kong (0.04)
(12 more...)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.67)
Information Technology (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

MMGET: A Markov model for generalized evidence theory

He, Yuanpeng

arXiv.org Artificial IntelligenceMay-12-2021

In real life, lots of information merges from time to time. To appropriately describe the actual situations, lots of theories have been proposed. Among them, Dempster-Shafer evidence theory is a very useful tool in managing uncertain information. To better adapt to complex situations of open world, a generalized evidence theory is designed. However, everything occurs in sequence and owns some underlying relationships with each other. In order to further embody the details of information and better conforms to situations of real world, a Markov model is introduced into the generalized evidence theory which helps extract complete information volume from evidence provided. Besides, some numerical examples is offered to verify the correctness and rationality of the proposed method.

probability, proposition, tex class file, (15 more...)

arXiv.org Artificial Intelligence

2105.07952

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Characterizing Uniform Convergence in Offline Policy Evaluation via model-based approach: Offline Learning, Task-Agnostic and Reward-Free

Yin, Ming, Wang, Yu-Xiang

arXiv.org Artificial IntelligenceMay-12-2021

We study the statistical limits of uniform convergence for offline policy evaluation (OPE) problems (uniform OPE for short) with model-based methods under episodic MDP setting. Uniform OPE $\sup_\Pi|Q^\pi-\hat{Q}^\pi|<\epsilon$ (initiated by Yin et al. 2021) is a stronger measure than the point-wise (fixed policy) OPE and ensures offline policy learning when $\Pi$ contains all policies (we call it global policy class). In this paper, we establish an $\Omega(H^2 S/d_m\epsilon^2)$ lower bound (over model-based family) for the global uniform OPE, where $d_m$ is the minimal state-action distribution induced by the behavior policy. The order $S/d_m\epsilon^2$ reveals global uniform OPE task is intrinsically harder than offline policy learning due to the extra $S$ factor. Next, our main result establishes an episode complexity of $\tilde{O}(H^2/d_m\epsilon^2)$ for \emph{local} uniform convergence that applies to all \emph{near-empirically optimal} policies for the MDPs with \emph{stationary} transition. The result implies the optimal sample complexity for offline learning and separates local uniform OPE from the global case. Paramountly, the model-based method combining with our new analysis technique (singleton absorbing MDP) can be adapted to the new settings: offline task-agnostic and the offline reward-free with optimal complexity $\tilde{O}(H^2\log(K)/d_m\epsilon^2)$ ($K$ is the number of tasks) and $\tilde{O}(H^2S/d_m\epsilon^2)$ respectively, which provides a unified framework for simultaneously solving different offline RL problems.

arxiv preprint arxiv, probability 1, slog, (11 more...)

arXiv.org Artificial Intelligence

2105.06029

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Online POMDP Planning via Simplification

Sztyglic, Ori, Indelman, Vadim

arXiv.org Artificial IntelligenceMay-11-2021

In this paper, we consider online planning in partially observable domains. Solving the corresponding POMDP problem is a very challenging task, particularly in an online setting. Our key contribution is a novel algorithmic approach, Simplified Information Theoretic Belief Space Planning (SITH-BSP), which aims to speed-up POMDP planning considering belief-dependent rewards, without compromising on the solution's accuracy. We do so by mathematically relating the simplified elements of the problem to the corresponding counterparts of the original problem. Specifically, we focus on belief simplification and use it to formulate bounds on the corresponding original belief-dependent rewards. These bounds in turn are used to perform branch pruning over the belief tree, in the process of calculating the optimal policy. We further introduce the notion of adaptive simplification, while re-using calculations between different simplification levels and exploit it to prune, at each level in the belief tree, all branches but one. Therefore, our approach is guaranteed to find the optimal solution of the original problem but with substantial speedup. As a second key contribution, we derive novel analytical bounds for differential entropy, considering a sampling-based belief representation, which we believe are of interest on their own. We validate our approach in simulation using these bounds and where simplification corresponds to reducing the number of samples, exhibiting a significant computational speedup while yielding the optimal solution.

belief tree, simplification, simplification level, (16 more...)

arXiv.org Artificial Intelligence

2105.05296

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Boltzmann machines as two-dimensional tensor networks

Li, Sujie, Pan, Feng, Zhou, Pengfei, Zhang, Pan

arXiv.org Machine LearningMay-10-2021

Restricted Boltzmann machines (RBM) and deep Boltzmann machines (DBM) are important models in machine learning, and recently found numerous applications in quantum many-body physics. We show that there are fundamental connections between them and tensor networks. In particular, we demonstrate that any RBM and DBM can be exactly represented as a two-dimensional tensor network. This representation gives an understanding of the expressive power of RBM and DBM using entanglement structures of the tensor networks, also provides an efficient tensor network contraction algorithm for the computing partition function of RBM and DBM. Using numerical experiments, we demonstrate that the proposed algorithm is much more accurate than the state-of-the-art machine learning methods in estimating the partition function of restricted Boltzmann machines and deep Boltzmann machines, and have potential applications in training deep Boltzmann machines for general machine learning tasks.

tensor, tensor network, two-dimensional tensor network, (12 more...)

arXiv.org Machine Learning

2105.0413

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback