AITopics

We develop a protocol for optimizing dynamic behavior of a network of simple electronic components, such as a sensor network, an ad hoc network of mobile devices, or a network of communication switches. This protocol requires only local communication and simple computations which are distributed among devices. The protocol is scalable to large networks. As a motivating example, we discuss a problem involving optimization of power consumption, delay, and buffer overflow in a sensor network. Our approach builds on policy gradient methods for optimization of Markov decision processes. The protocol can be viewed as an extension of policy gradient methods to a context involving a team of agents optimizing aggregate performance through asynchronous distributed communication and computation. We establish that the dynamics of the protocol approximate the solution to an ordinary differential equation that follows the gradient of the performance objective.

ith sensor, protocol, sensor, (12 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Semiconductors & Electronics (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Bererton, Curt, Gordon, Geoffrey J., Thrun, Sebastian

Auction Mechanism Design for Multi-Robot Coordination

The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball.

algorithm, constraint, robot, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > South Korea > Seoul > Seoul (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.70)

Bagnell, J. A., Kakade, Sham M., Schneider, Jeff G., Ng, Andrew Y.

Policy Search by Dynamic Programming

We consider the policy search approach to reinforcement learning. We show that if a "baseline distribution" is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide nontrivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

algorithm, non-stationary policy, psdp, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Kim, H. J., Jordan, Michael I., Sastry, Shankar, Ng, Andrew Y.

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous helicopter flight represents a challenging control problem, with complex, noisy, dynamics. In this paper, we describe a successful application of reinforcement learning to autonomous helicopter flight.

controller, helicopter, trajectory, (15 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Pineau, Joelle, Gordon, Geoffrey J., Thrun, Sebastian

Applying Metric-Trees to Belief-Point POMDPs

Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems.

algorithm, belief point, node, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Dornhege, Guido, Blankertz, Benjamin, Curio, Gabriel, Müller, Klaus-Robert

Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

Brain-Computer Interfaces (BCI) are an interesting emerging technology that is driven by the motivation to develop an effective communication interface translating human intentions into a control signal for devices like computers or neuroprostheses. If this can be done bypassing the usual human output pathways like peripheral nerves and muscles it can ultimately become a valuable tool for paralyzed patients.

algorithm, classification, experiment, (15 more...)

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > Germany > Berlin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

Tanaka, Saori C., Doya, Kenji, Okada, Go, Ueda, Kazutaka, Okamoto, Yasumasa, Yamawaki, Shigeto

To understand the brain mechanisms involved in reward prediction on different time scales, we developed a Markov decision task that requires prediction of both immediate and future rewards, and analyzed subjects' brain activities using functional MRI. We estimated the time course of reward prediction and reward prediction error on different time scales from subjects' performance data, and used them as the explanatory variables for SPM analysis. We found topographic maps of different time scales in medial frontal cortex and striatum. The result suggests that different cortico-basal ganglia loops are specialized for reward prediction on different time scales.

different time scale, reward prediction, time scale, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Suzuki, Jun, Sasaki, Yutaka, Maeda, Eisaku

Kernels for Structured Natural Language Data

This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels.

information, kernel, node, (13 more...)

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Modeling User Rating Profiles For Collaborative Filtering

Marlin, Benjamin M.

In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.

aspect model, experiment, user attitude, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > Minnesota (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Media (0.46)
Banking & Finance > Credit (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Fast Embedding of Sparse Similarity Graphs

Platt, John C.

This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the locations can be used to browse music and to generate playlists. MDS on very large sparse graphs can be effectively performed by a family of algorithms called Rectangular Dijsktra (RD) MDS algorithms. These RD algorithms operate on a dense rectangular slice of the distance matrix, created by calling Dijsktra a constant number of times. Two RD algorithms are compared: Landmark MDS, which uses the Nyström approximation to perform MDS; and a new algorithm called Fast Sparse Embedding, which uses FastMap. These algorithms compare favorably to Laplacian Eigenmaps, both in terms of speed and embedding quality.

algorithm, graph, mds algorithm, (17 more...)

Country:

North America > United States > Washington > King County > Redmond (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)