AITopics | Doya, Kenji

Collaborating Authors

Doya, Kenji

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Emergence of Hierarchy via Reinforcement Learning Using a Multiple Timescale Stochastic RNN

Han, Dongqi, Doya, Kenji, Tani, Jun

arXiv.org Machine LearningFeb-12-2019

Although recurrent neural networks (RNNs) for reinforcement learning (RL) have addressed unique advantages in various aspects, e. g., solving memory-dependent tasks and meta-learning, very few studies have demonstrated how RNNs can solve the problem of hierarchical RL by autonomously developing hierarchical control. In this paper, we propose a novel model-free RL framework called ReMASTER, which combines an off-policy actor-critic algorithm with a multiple timescale stochastic recurrent neural network for solving memory-dependent and hierarchical tasks. We performed experiments using a challenging continuous control task and showed that: (1) Internal representation necessary for achieving hierarchical control autonomously develops through exploratory learning. (2) Stochastic neurons in RNNs enable faster relearning when adapting to a new task which is a recomposition of sub-goals previously learned.

deep learning, neural network, neurology, (21 more...)

arXiv.org Machine Learning

1901.10113

Country: Asia > Japan (0.29)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.68)
Education (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Parmas, Paavo, Rasmussen, Carl Edward, Peters, Jan, Doya, Kenji

arXiv.org Machine LearningFeb-4-2019

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by $10^6$ times.

deep learning, gradient, neural network, (20 more...)

arXiv.org Machine Learning

1902.0124

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

Tokuda, Tomoki, Yoshimoto, Junichiro, Shimizu, Yu, Toki, Shigeru, Okada, Go, Takamura, Masahiro, Yamamoto, Tetsuya, Yoshimura, Shinpei, Okamoto, Yasumasa, Yamawaki, Shigeto, Doya, Kenji

arXiv.org Machine LearningOct-21-2015

We propose a novel method for multiple clustering that assumes a co-clustering structure (partitions in both rows and columns of the data matrix) in each view. The new method is applicable to high-dimensional data. It is based on a nonparametric Bayesian approach in which the number of views and the number of feature-/subject clusters are inferred in a data-driven manner. We simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block. This makes our method applicable to datasets consisting of both numerical and categorical variables, which biomedical data typically do. Clustering solutions are based on variational inference with mean field approximation. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

attention deficit hyperactivity disorder, multiple co-clustering method, vascular disease, (21 more...)

arXiv.org Machine Learning

1510.06138

Country:

Asia > Japan > Honshū (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.87)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.70)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

A Generalized Natural Actor-Critic Algorithm

Morimura, Tetsuro, Uchibe, Eiji, Yoshimoto, Junichiro, Doya, Kenji

Neural Information Processing SystemsDec-31-2009

Policy gradient Reinforcement Learning (RL) algorithms have received substantial attention,seeking stochastic policies that maximize the average (or discounted cumulative) reward. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakade's Fisher Information Matrix (FIM) for the policy (action) distribution and Morimura's FIM for the stateaction jointdistribution, but all RL algorithms with NG have followed Kakade's approach. In this paper, we describe a generalized Natural Gradient (gNG) that linearly interpolates the two FIMs and propose an efficient implementation for the gNG learning based on a theory of the estimating function, the generalized Natural Actor-Critic(gNAC) algorithm. The gNAC algorithm involves a near optimal auxiliary function to reduce the variance of the gNG estimates. Interestingly, the gNAC can be regarded as a natural extension of the current state-of-the-art NAC algorithm [1], as long as the interpolating parameter is appropriately selected. Numerical experimentsshowed that the proposed gNAC algorithm can estimate gNG efficiently and outperformed the NAC algorithm.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kantō (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Responding to Modalities with Different Latencies

Bissmarck, Fredrik, Nakahara, Hiroyuki, Doya, Kenji, Hikosaka, Okihide

Neural Information Processing SystemsDec-31-2005

Motor control depends on sensory feedback in multiple modalities with different latencies. In this paper we consider within the framework of reinforcement learning how different sensory modalities can be combined and selected for real-time, optimal movement control. We propose an actor-critic architecture with multiple modules, whose output are combined using a softmax function. We tested our architecture in a simulation of a sequential reaching task. Reaching was initially guided by visual feedback with a long latency. Our learning scheme allowed the agent to utilize the somatosensory feedback with shorter latency when the hand is near the experienced trajectory. In simulations with different latencies for visual and somatosensory feedback, we found that the agent depended more on feedback with shorter latency.

module, neural network, neurology, (22 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Responding to Modalities with Different Latencies

Bissmarck, Fredrik, Nakahara, Hiroyuki, Doya, Kenji, Hikosaka, Okihide

Neural Information Processing SystemsDec-31-2005

We tested our architecture in a simulation of a sequential reaching task.

module, neural network, neurology, (21 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.70)

Add feedback

Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

Tanaka, Saori C., Doya, Kenji, Okada, Go, Ueda, Kazutaka, Okamoto, Yasumasa, Yamawaki, Shigeto

Neural Information Processing SystemsDec-31-2004

To understand the brain mechanisms involved in reward prediction on different time scales, we developed a Markov decision task that requires prediction of both immediate and future rewards, and analyzed subjects' brain activities using functional MRI. We estimated the time course of reward prediction and reward prediction error on different time scales from subjects' performance data, and used them as the explanatory variables for SPM analysis. We found topographic maps of different time scales in medial frontal cortex and striatum. The result suggests that different cortico-basal ganglia loops are specialized for reward prediction on different time scales.

neural network, neurology, time scale, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

Samejima, Kazuyuki, Doya, Kenji, Ueda, Yasumasa, Kimura, Minoru

Neural Information Processing SystemsDec-31-2004

When we model a higher order functions, such as learning and memory, we face a difficulty of comparing neural activities with hidden variables that depend on the history of sensory and motor signals and the dynamics of the network. Here, we propose novel method for estimating hidden variables of a learning agent, such as connection weights from sequences of observable variables. Bayesian estimation is a method to estimate the posterior probability of hidden variables from observable data sequence using a dynamic model of hidden and observable variables. In this paper, we apply particle filter for estimating internal parameters and metaparameters of a reinforcement learning model. We verified the effectiveness of the method using both artificial data and real animal behavioral data.

bayesian inference, neurology, sequence, (20 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture (0.15)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

Tanaka, Saori C., Doya, Kenji, Okada, Go, Ueda, Kazutaka, Okamoto, Yasumasa, Yamawaki, Shigeto

Neural Information Processing SystemsDec-31-2004

To understand the brain mechanisms involved in reward prediction on different time scales, we developed a Markov decision task that requires prediction of both immediate and future rewards, and analyzed subjects'brain activities using functional MRI. We estimated the time course of reward prediction and reward prediction error on different time scales from subjects' performance data, and used them as the explanatory variables for SPM analysis. We found topographic mapsof different time scales in medial frontal cortex and striatum. The result suggests that different cortico-basal ganglia loops are specialized for reward prediction on different time scales.

neural network, neurology, time scale, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

Samejima, Kazuyuki, Doya, Kenji, Ueda, Yasumasa, Kimura, Minoru

Neural Information Processing SystemsDec-31-2004

When we model a higher order functions, such as learning and memory, we face a difficulty of comparing neural activities with hidden variables that depend on the history of sensory and motor signals and the dynamics ofthe network. Here, we propose novel method for estimating hidden variables of a learning agent, such as connection weights from sequences of observable variables. Bayesian estimation is a method to estimate the posterior probability of hidden variables from observable data sequence using a dynamic model of hidden and observable variables.

bayesian inference, neurology, sequence, (19 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture (0.15)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)

Add feedback