AITopics

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

agent, algorithm, neigh, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Policy-Gradient Methods for Planning

Aberdeen, Douglas

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.

agent, decision tree, probability, (17 more...)

Country: Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Yu, Jin, Aberdeen, Douglas, Schraudolph, Nicol N.

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

The stochastic meta--descent (SMD) gain adaptation algorithm [3, 4] can considerably accelerate the convergence of stochastic gradient descent.

artificial intelligence, gradient, machine learning, (14 more...)

Country:

Oceania > Australia (0.29)
Europe > United Kingdom (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.33)

Edwards, Jaety, Forsyth, David

Searching for Character Models

We introduce a method to automatically improve character models for a handwritten script without the use of transcriptions and using a minimum of document specific training data. We show that we can use searches for the words in a dictionary to identify portions of the document whose transcriptions are unambiguous. Using templates extracted from those regions, we retrain our character prediction model to drastically improve our search retrieval performance for words in the document.

machine learning, simulation of human behavior, template, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Learning Influence among Interacting Markov Chains

Zhang, Dong, Gatica-perez, Daniel, Bengio, Samy, Roy, Deb

In multi-agent systems, individuals within a group coordinate and interact to achieve a goal.

artificial intelligence, influence model, machine learning, (18 more...)

Country: North America > United States > Massachusetts (0.28)

Industry: Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Estimating the wrong Markov random field: Benefits in the computation-limited setting

Wainwright, Martin J.

Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial setof data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working in the computation-limited setting, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result ofthis paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the "wrong" model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensatefor errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property thatholds for a broad class of variational methods. We show that joint estimation/prediction basedon the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product.

artificial intelligence, machine learning, prediction, (15 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Verma, Deepak, Rao, Rajesh P.

Goal-Based Imitation as Probabilistic Inference over Graphical Models

Humans are extremely adept at learning new skills by imitating the actions ofothers. A progression of imitative abilities has been observed in children, ranging from imitation of simple body movements to goalbased imitationbased on inferring intent. In this paper, we show that the problem of goal-based imitation can be formulated as one of inferring goals and selecting actions using a learned probabilistic graphical model of the environment. We first describe algorithms for planning actions to achieve a goal state using probabilistic inference. We then describe how planning can be used to bootstrap the learning of goal-dependent policies byutilizing feedback from the environment. The resulting graphical model is then shown to be powerful enough to allow goal-based imitation. Usinga simple maze navigation task, we illustrate how an agent can infer the goals of an observed teacher and imitate the teacher even when the goals are uncertain and the demonstration is incomplete.

artificial intelligence, imitation, machine learning, (11 more...)

Country:

Europe (0.28)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)

Taskar, Ben, Lacoste-Julien, Simon, Jordan, Michael I.

Structured Prediction via the Extragradient Method

We present a simple and scalable algorithm for large-margin estimation of structured models, including an important Class of Markov networks and combinatorial models.

artificial intelligence, inductive learning, machine learning, (19 more...)

Country: North America > Canada (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Saxena, Ashutosh, Chung, Sung H., Ng, Andrew Y.

Learning Depth from Single Monocular Images

We consider the task of depth estimation from a single monocular image. Wetake a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoorenvironments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learningto predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local-and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps.

artificial intelligence, depthmap, machine learning, (18 more...)

Country: North America > United States > California > Santa Clara County (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Mochihashi, Daichi, Matsumoto, Yuji

Context as Filtering

Long-distance language modeling is important not only in speech recognition and machine translation, but also in high-dimensional discrete sequence modeling in general.

machine learning, modeling, natural language, (19 more...)

Country: Asia > Japan > Honshū (0.14)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)