AITopics | Wolpert, David

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

Menda, Kunal, Chen, Yi-Chun, Grana, Justin, Bono, James W., Tracey, Brendan D., Kochenderfer, Mykel J., Wolpert, David

arXiv.org Artificial IntelligenceMay-29-2019

The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.

artificial intelligence, machine learning, simulation, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TITS.2018.2848264

1709.06656

Country:

North America > United States > California > Santa Clara County (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Transportation (0.47)
Government (0.46)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Add feedback

Using Machine Learning to Improve Stochastic Optimization

Wolpert, David (Santa Fe Institute) | Rajnarayan, Dev (Sensor Platforms Inc.)

AAAI ConferencesJul-9-2013

In many stochastic optimization algorithms there is a hyperparameter that controls how the next sampling distribution is determined from the current data set of samples of the objective function. This hyperparameter controls the exploration/exploitation trade-off of the next sample. Typically heuristic "rules of thumb" are used to set that hyperparameter, e.g., a pre-fixed annealing schedule. We show how machine learning provides more principled alternatives to (adaptively) set that hyperparameter, and demonstrate that these alternatives can substantially improve optimization performance.

machine learning, stochastic optimization

AAAI Conferences

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.40)

Add feedback

Using Supervised Learning to Improve Monte Carlo Integral Estimation

Tracey, Brendan, Wolpert, David, Alonso, Juan J.

arXiv.org Machine LearningAug-24-2011

Monte Carlo (MC) techniques are often used to estimate integrals of a multivariate function using randomly generated samples of the function. In light of the increasing interest in uncertainty quantification and robust design applications in aerospace engineering, the calculation of expected values of such functions (e.g. performance measures) becomes important. However, MC techniques often suffer from high variance and slow convergence as the number of samples increases. In this paper we present Stacked Monte Carlo (StackMC), a new method for post-processing an existing set of MC samples to improve the associated integral estimate. StackMC is based on the supervised learning techniques of fitting functions and cross validation. It should reduce the variance of any type of Monte Carlo integral estimate (simple sampling, importance sampling, quasi-Monte Carlo, MCMC, etc.) without adding bias. We report on an extensive set of experiments confirming that the StackMC estimate of an integral is more accurate than both the associated unprocessed Monte Carlo estimate and an estimate based on a functional fit to the MC samples. These experiments run over a wide variety of integration spaces, numbers of sample points, dimensions, and fitting functions. In particular, we apply StackMC in estimating the expected value of the fuel burn metric of future commercial aircraft and in estimating sonic boom loudness measures. We compare the efficiency of StackMC with that of more standard methods and show that for negligible additional computational cost significant increases in accuracy are gained.

air transportation, inductive learning, stackmc, (18 more...)

arXiv.org Machine Learning

1108.4879

Country:

North America > United States > California (0.14)
North America > United States > Texas (0.14)

Genre: Research Report (0.40)

Industry:

Transportation > Air (1.00)
Aerospace & Defense (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.90)

Add feedback

Using Collective Intelligence to Route Internet Traffic

Wolpert, David, Tumer, Kagan, Frank, Jeremy

Neural Information Processing SystemsDec-31-1999

A COllective INtelligence (COIN) is a set of interacting reinforcement learning (RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using that theory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms. 1 INTRODUCTION COllective INtelligences (COINs) are large, sparsely connected recurrent neural networks, whose "neurons" are reinforcement learning (RL) algorithms. The distinguishing feature of COINs is that their dynamics involves no centralized control, but only the collective effects of the individual neurons each modifying their behavior via their individual RL algorithms. This restriction holds even though the goal of the COIN concerns the system's global behavior.

neuron, space agency, us government, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.49)

Industry:

Government > Space Agency (0.31)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using Collective Intelligence to Route Internet Traffic

Wolpert, David, Tumer, Kagan, Frank, Jeremy

Neural Information Processing SystemsDec-31-1999

A COllective INtelligence (COIN) is a set of interacting reinforcement learning(RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using thattheory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms. 1 INTRODUCTION COllective INtelligences (COINs) are large, sparsely connected recurrent neural networks, whose "neurons" are reinforcement learning (RL) algorithms. The distinguishing featureof COINs is that their dynamics involves no centralized control, but only the collective effects of the individual neurons each modifying their behavior viatheir individual RL algorithms. This restriction holds even though the goal of the COIN concerns the system's global behavior.

neuron, space agency, us government, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.49)

Industry:

Government > Space Agency (0.31)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using Collective Intelligence to Route Internet Traffic

Wolpert, David, Tumer, Kagan, Frank, Jeremy

Neural Information Processing SystemsDec-31-1999

A COllective INtelligence (COIN) is a set of interacting reinforcement learning (RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using that theory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms. 1 INTRODUCTION COllective INtelligences (COINs) are large, sparsely connected recurrent neural networks, whose "neurons" are reinforcement learning (RL) algorithms. The distinguishing feature of COINs is that their dynamics involves no centralized control, but only the collective effects of the individual neurons each modifying their behavior via their individual RL algorithms. This restriction holds even though the goal of the COIN concerns the system's global behavior.

neuron, space agency, us government, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.49)

Industry:

Government > Space Agency (0.31)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Stacked Density Estimation

Smyth, Padhraic, Wolpert, David

Neural Information Processing SystemsDec-31-1998

The component gj's are usually relatively simple unimodal densities such as Gaussians. Density estimation with mixtures involves finding the locations, shapes, and weights of the component densities from the data (using for example the Expectation-Maximization (EM) procedure). Kernel density estimation canbe viewed as a special case of mixture modeling where a component is centered at each data point, given a weight of 1/N, and a common covariance structure (kernel shape) is estimated from the data. The quality of a particular probabilistic model can be evaluated by an appropriate scoring rule on independent out-of-sample data, such as the test set log-likelihood (also referred to as the log-scoring rule in the Bayesian literature).

density estimation, space agency, us government, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry:

Government > Space Agency (0.47)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stacked Density Estimation

Smyth, Padhraic, Wolpert, David

Neural Information Processing SystemsDec-31-1998

One frequently estimates density functions for which there is little prior knowledge on the shape of the density and for which one wants a flexible and robust estimator (allowing multimodality if it exists). In this context, the methods of choice tend to be finite mixture models and kernel density estimation methods. For mixture modeling, mixtures of Gaussian components are frequently assumed and model choice reduces to the problem of choosing the number k of Gaussian components in the model (Titterington, Smith and Makov, 1986). For kernel density estimation, kernel shapes are typically chosen from a selection of simple unimodal densities such as Gaussian, triangular, or Cauchy densities, and kernel bandwidths are selected in a data-driven manner (Silverman 1986; Scott 1994). As argued by Draper (1996), model uncertainty can contribute significantly to pre- - Also with the Jet Propulsion Laboratory 525-3660, California Institute of Technology, Pasadena, CA 91109 Stacked Density Estimation 669 dictive error in estimation. While usually considered in the context of supervised learning, model uncertainty is also important in unsupervised learning applications such as density estimation. Even when the model class under consideration contains the true density, if we are only given a finite data set, then there is always a chance of selecting the wrong model. Moreover, even if the correct model is selected, there will typically be estimation error in the parameters of that model.

density estimation, space agency, us government, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Pasadena (0.24)

Industry:

Government > Space Agency (0.47)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Use of Evidence in Neural Networks

Wolpert, David

Neural Information Processing SystemsDec-31-1993

The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evidence approximation,the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence procedure's MAPestimate for neural nets is, in toto, approximation error. Another advantage of the exact analysis is that it does not lead one to incorrect intuition, like the claim that using evidence one can "evaluate different priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.

Add feedback

On the Use of Evidence in Neural Networks

Wolpert, David

Neural Information Processing SystemsDec-31-1993

The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evidence approximation, the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence procedure's MAP estimate for neural nets is, in toto, approximation error. Another advantage of the exact analysis is that it does not lead one to incorrect intuition, like the claim that using evidence one can "evaluate different priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.

Add feedback

Filters

Collaborating Authors

Wolpert, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

Using Machine Learning to Improve Stochastic Optimization

Using Supervised Learning to Improve Monte Carlo Integral Estimation

Using Collective Intelligence to Route Internet Traffic

Using Collective Intelligence to Route Internet Traffic

Using Collective Intelligence to Route Internet Traffic

Stacked Density Estimation

Stacked Density Estimation

On the Use of Evidence in Neural Networks

On the Use of Evidence in Neural Networks