AITopics

1811.00128

Country:

Europe (0.93)
North America > United States > Pennsylvania (0.14)
North America > United States > Colorado (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningJul-4-2018

Transfer with Model Features in Reinforcement Learning

Lehnert, Lucas, Littman, Michael L.

A key question in Reinforcement Learning is which representation an agent can learn to efficiently reuse knowledge between different tasks. Recently the Successor Representation was shown to have empirical benefits for transferring knowledge between tasks with shared transition dynamics. This paper presents Model Features: a feature representation that clusters behaviourally equivalent states and that is equivalent to a Model-Reduction. Further, we present a Successor Feature model which shows that learning Successor Features is equivalent to learning a Model-Reduction. A novel optimization objective is developed and we provide bounds showing that minimizing this objective results in an increasingly improved approximation of a Model-Reduction. Further, we provide transfer experiments on randomly generated MDPs which vary in their transition and reward functions but approximately preserve behavioural equivalence between states. These results demonstrate that Model Features are suitable for transfer between tasks with varying transition and reward functions.

artificial intelligence, reinforcement learning, representation, (18 more...)

1807.01736

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

arXiv.org Machine LearningJun-1-2018

Equivalence Between Wasserstein and Value-Aware Model-based Reinforcement Learning

Asadi, Kavosh, Cater, Evan, Misra, Dipendra, Littman, Michael L.

Learning a generative model is a key component of model-based reinforcement learning. Though learning a good model in the tabular setting is a simple task, learning a useful model in the approximate setting is challenging. Recently Farahmand et al. (2017) proposed a value-aware (VAML) objective that captures the structure of value function during model learning. Using tools from Lipschitz continuity, we show that minimizing the VAML objective is in fact equivalent to minimizing the Wasserstein metric.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1806.01265

Country:

North America > Canada > Quebec (0.15)
North America > United States > Texas (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.48)

arXiv.org Artificial IntelligenceApr-19-2018

Lipschitz Continuity in Model-based Reinforcement Learning

Asadi, Kavosh, Misra, Dipendra, Littman, Michael L.

Model-based reinforcement-learning methods learn transition and reward models and use them to guide behavior. We analyze the impact of learning models that are Lipschitz continuous---the distance between function values for two inputs is bounded by a linear function of the distance between the inputs. Our first result shows a tight bound on model errors for multi-step predictions with Lipschitz continuous models. We go on to prove an error bound for the value-function estimate arising from such models and show that the estimated value function is itself Lipschitz continuous. We conclude with empirical results that demonstrate significant benefits to enforcing Lipschitz continuity of neural net models during reinforcement learning.

artificial intelligence, lipschitz continuity, reinforcement learning, (13 more...)

1804.07193

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

AAAI ConferencesFeb-8-2018

Bandit-Based Solar Panel Control

Abel, David (Brown University) | Williams, Edward C. (Brown University) | Brawner, Stephen (Brown University) | Reif, Emily (Brown University) | Littman, Michael L. (Brown University)

Solar panels sustainably harvest energy from the sun. To improve performance, panels are often equipped with a tracking mechanism that computes the sun’s position in the sky throughout the day. Based on the tracker’s estimate of the sun’s location, a controller orients the panel to minimize the angle of incidence between solar radiant energy and the photovoltaic cells on the surface of the panel, increasing total energy harvested. Prior work has developed efficient tracking algorithms that accurately compute the sun’s location to facilitate solar tracking and control. However, always pointing a panel directly at the sun does not account for diffuse irradiance in the sky, reflected irradiance from the ground and surrounding surfaces, power required to reorient the panel, shading effects from neighboring panels and foliage, or changing weather conditions (such as clouds), all of which are contributing factors to the total energy harvested by a fleet of solar panels. In this work, we show that a bandit-based approach can increase the total energy harvested by solar panels by learning to dynamically account for such other factors. Our contribution is threefold: (1) the development of a test bed based on typical solar and irradiance models for experimenting with solar panel control using a variety of learning methods, (2) simulated validation that bandit algorithms can effectively learn to control solar panels, and (3) the design and construction of an intelligent solar panel prototype that learns to angle itself using bandit algorithms.

algorithm, big data, renewable energy, (23 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > Alaska (0.15)
Asia > Middle East > Iran (0.14)

Industry: Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)

arXiv.org Artificial IntelligenceDec-30-2017

Modeling Latent Attention Within Neural Networks

Grimm, Christopher, Arumugam, Dilip, Karamcheti, Siddharth, Abel, David, Wong, Lawson L. S., Littman, Michael L.

Deep neural networks are able to solve tasks across a variety of domains and modalities of data. Despite many empirical successes, we lack the ability to clearly understand and interpret the learned internal mechanisms that contribute to such effective behaviors or, more critically, failure modes. In this work, we present a general method for visualizing an arbitrary neural network's inner mechanisms and their power and limitations. Our dataset-centric method produces visualizations of how a trained network attends to components of its inputs. The computed "attention masks" support improved interpretability by highlighting which input attributes are critical in determining output. We demonstrate the effectiveness of our framework on a variety of deep neural network architectures in domains from computer vision, natural language processing, and reinforcement learning. The primary contribution of our approach is an interpretable visualization of attention that provides unique insights into the network's underlying decision-making process irrespective of the data modality.

attention mask, deep learning, neural network, (19 more...)

1706.00536

Country: North America > United States (0.28)

Industry:

Information Technology (0.46)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-19-2017

Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting

Grimm, Christopher, Song, Yuhang, Littman, Michael L.

Generative adversarial networks (GANs) are an exciting alternative to algorithms for solving density estimation problems---using data to assess how likely samples are to be drawn from the same distribution. Instead of explicitly computing these probabilities, GANs learn a generator that can match the given probabilistic source. This paper looks particularly at this matching capability in the context of problems with one-dimensional outputs. We identify a class of function decompositions with properties that make them well suited to the critic role in a leading approach to GANs known as Wasserstein GANs. We show that Taylor and Fourier series decompositions belong to our class, provide examples of these critics outperforming standard GAN approaches, and suggest how they can be scaled to higher dimensional problems in the future.

earth-mover, neural network, optimization problem, (18 more...)

1709.06533

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

arXiv.org Machine LearningJul-31-2017

Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning

Lehnert, Lucas, Tellex, Stefanie, Littman, Michael L.

One question central to Reinforcement Learning is how to learn a feature representation that supports algorithm scaling and re-use of learned information from different tasks. Successor Features approach this problem by learning a feature representation that satisfies a temporal constraint. We present an implementation of an approach that decouples the feature representation from the reward function, making it suitable for transferring knowledge between domains. We then assess the advantages and limitations of using Successor Features for transfer.

artificial intelligence, reinforcement learning, representation, (16 more...)

1708.00102

Country: North America > United States > Rhode Island (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ask Me Anything about MOOCs

Fisher, Doug (Vanderbilt University.) | Isbell, Charles (Georgia Institute of Technology) | Littman, Michael L. (Brown University) | Wollowski, Michael (Rose-Hulman Institute of Technology) | Neller, Todd W. (Gettysburg College) | Boerkoel, Jim (Harvey Mudd College)

AI MagazineJul-1-2017

In this article, ten questions about MOOCs (crowdsourced from the recipients of the AAAI and SIGCSE mailing lists) were posed by editors Michael Wollowski, Todd Neller, James Boerkoel to Douglas H. Fisher, Charles Isbell Jr., and Michael Littman — educators with unique, relevant experiences to lend their perspective on those issues.

charles isbell, computer based training, educational technology, (17 more...)

AI Magazine

Country: North America > United States > Massachusetts (0.14)

Genre:

Personal (1.00)
Instructional Material > Online (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceJun-14-2017

An Alternative Softmax Operator for Reinforcement Learning

Asadi, Kavosh, Littman, Michael L.

A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study a differentiable softmax operator that, among other properties, is a non-expansion ensuring a convergent behavior in learning and planning. We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice.

artificial intelligence, operator, reinforcement learning, (13 more...)

1612.05628

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)