AITopics | Boutilier, Craig

Collaborating Authors

Boutilier, Craig

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Advantage Amplification in Slowly Evolving Latent-State Environments

Mladenov, Martin, Meshi, Ofer, Ooi, Jayden, Schuurmans, Dale, Boutilier, Craig

arXiv.org Artificial IntelligenceMay-29-2019

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle of advantage amplification that can overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task.

artificial intelligence, experiment, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1905.13559

Country:

North America > United States (1.00)
North America > Canada > Alberta (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Perturbed-History Exploration in Stochastic Linear Bandits

Kveton, Branislav, Szepesvari, Csaba, Ghavamzadeh, Mohammad, Boutilier, Craig

arXiv.org Machine LearningMar-21-2019

We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits. The key idea is to build a perturbed history, which mixes the history of observed rewards with a pseudo-history of randomly generated i.i.d. pseudo-rewards. Our algorithm, perturbed-history exploration in a linear bandit (LinPHE), estimates a linear model from its perturbed history and pulls the arm with the highest value under that model. We prove a $\tilde{O}(d \sqrt{n})$ gap-free bound on the expected $n$-round regret of LinPHE, where $d$ is the number of features. Our analysis relies on novel concentration and anti-concentration bounds on the weighted sum of Bernoulli random variables. To show the generality of our design, we extend LinPHE to a logistic reward model. We evaluate both algorithms empirically and show that they are practical.

artificial intelligence, data mining, linphe, (18 more...)

arXiv.org Machine Learning

1903.09132

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

Kveton, Branislav, Szepesvari, Csaba, Ghavamzadeh, Mohammad, Boutilier, Craig

arXiv.org Machine LearningFeb-26-2019

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds $O(t)$ i.i.d. pseudo-rewards to its history in round $t$ and then pulls the arm with the highest estimated value in its perturbed history. Therefore, we call it perturbed-history exploration (PHE). The pseudo-rewards are designed to offset the underestimated values of arms in round $t$ with a sufficiently high probability. We analyze PHE in a $K$-armed bandit and prove a $O(K \Delta^{-1} \log n)$ bound on its $n$-round regret, where $\Delta$ is the minimum gap between the expected rewards of the optimal and suboptimal arms. The key to our analysis is a novel argument that shows that randomized Bernoulli rewards lead to optimism. We compare PHE empirically to several baselines and show that it is competitive with the best of them.

artificial intelligence, big data, phe, (17 more...)

arXiv.org Machine Learning

1902.10089

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)

Add feedback

Data center cooling using model-predictive control

Lazic, Nevena, Boutilier, Craig, Lu, Tyler, Wong, Eehern, Roy, Binz, Ryu, MK, Imwalle, Greg

Neural Information Processing SystemsDec-31-2018

Despite the impressive recent advances in reinforcement learning (RL) algorithms, their deployment to real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures. In this paper, we describe an application of RL "in the wild" to the task of regulating temperatures and airflow inside a large-scale data center (DC). Adopting a data-driven, modelbased approach, we demonstrate that an RL agent with little prior knowledge is able to effectively and safely regulate conditions on a server floor after just a few hours of exploration, while improving operational efficiency relative to existing PID controllers.

controller, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry:

Information Technology > Services (0.86)
Energy > Oil & Gas > Upstream (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-delusional Q-learning and value-iteration

Lu, Tyler, Schuurmans, Dale, Boutilier, Craig

Neural Information Processing SystemsDec-31-2018

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. Delusional bias arises when the approximation architecture limits the class of expressible greedy policies. Since standard Q-updates make globally uncoordinated action choices with respect to the expressible policy class, inconsistent or even conflicting Q-value estimates can result, leading to pathological behaviour such as over/under-estimation, instability and even divergence. To solve this problem, we introduce a new notion of policy consistency and define a local backup process that ensures global consistency through the use of information sets---sets that record constraints on policies consistent with backed-up Q-values. We prove that both the model-based and model-free algorithms using this backup remove delusional bias, yielding the first known algorithms that guarantee optimal results under general conditions. These algorithms furthermore only require polynomially many information sets (from a potentially exponential support). Finally, we suggest other practical heuristics for value-iteration and Q-learning that attempt to reduce delusional bias.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.46)
North America > Canada > Quebec (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-delusional Q-learning and value-iteration

Lu, Tyler, Schuurmans, Dale, Boutilier, Craig

Neural Information Processing SystemsDec-31-2018

artificial intelligence, delusional bias, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.46)
North America > Canada > Quebec (0.28)
North America > United States > Massachusetts > Middlesex County (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Data center cooling using model-predictive control

Lazic, Nevena, Boutilier, Craig, Lu, Tyler, Wong, Eehern, Roy, Binz, Ryu, MK, Imwalle, Greg

Neural Information Processing SystemsDec-31-2018

Despite the impressive recent advances in reinforcement learning (RL) algorithms, their deployment to real-world physical systems is often complicated by unexpected events, limited data, and the potential for expensive failures. In this paper, we describe an application of RL "in the wild" to the task of regulating temperatures and airflow inside a large-scale data center (DC). Adopting a data-driven, modelbased approach,we demonstrate that an RL agent with little prior knowledge is able to effectively and safely regulate conditions on a server floor after just a few hours of exploration, while improving operational efficiency relative to existing PID controllers.

controller, survey article, upstream oil & gas, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry:

Information Technology > Services (0.86)
Energy > Oil & Gas > Upstream (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Seq2Slate: Re-ranking and Slate Optimization with RNNs

Bello, Irwan, Kulkarni, Sayali, Jain, Sagar, Boutilier, Craig, Chi, Ed, Eban, Elad, Luo, Xiyang, Mackey, Alan, Meshi, Ofer

arXiv.org Machine LearningOct-3-2018

Ranking is a central task in machine learning and information retrieval. In this task, it is especially important to present the user with a slate of items that is appealing as a whole. This in turn requires taking into account interactions between items, since intuitively, placing an item on the slate affects the decision of which other items should be placed alongside it. In this work, we propose a sequence-to-sequence model for ranking called seq2slate. At each step, the model predicts the next item to place on the slate given the items already selected. The recurrent nature of the model allows complex dependencies between items to be captured directly in a flexible and scalable way. We show how to learn the model end-to-end from weak supervision in the form of easily obtained click-through data. We further demonstrate the usefulness of our approach in experiments on standard ranking benchmarks as well as in a real-world recommendation system.

deep learning, information retrieval, neural network, (20 more...)

arXiv.org Machine Learning

1810.02019

Country: Europe > Spain (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Planning and Learning with Stochastic Action Sets

Boutilier, Craig, Cohen, Alon, Daniely, Amit, Hassidim, Avinatan, Mansour, Yishay, Meshi, Ofer, Mladenov, Martin, Schuurmans, Dale

arXiv.org Artificial IntelligenceMay-7-2018

In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution.

artificial intelligence, mdp, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1805.02363

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

The Pricing War Continues: On Competitive Multi-Item Pricing

Lev, Omer (The Hebrew University) | Oren, Joel (University of Toronto) | Boutilier, Craig (University of Toronto) | Rosenschein, Jeffrey S. (The Hebrew University)

AAAI ConferencesMar-6-2015

We study a game with \emph{strategic} vendors (the agents) who own multiple items and a single buyer with a submodular valuation function. The goal of the vendors is to maximize their revenue via pricing of the items, given that the buyer will buy the set of items that maximizes his net payoff.% (valuation minus the prices). We show this game may not always have a pure Nash equilibrium, in contrast to previous results for the special case where each vendor owns a single item. We do so by relating our game to an intermediate, discrete game in which the vendors only choose the available items, and their prices are set exogenously afterwards. We further make use of the intermediate game to provide tight bounds on the price of anarchy for the subset games that have pure Nash equilibria; we find that the optimal PoA reached in the previous special cases does not hold, but only a logarithmic one. Finally, we show that for a special case of submodular functions, efficient pure Nash equilibria always exist.

artificial intelligence, game theory, nash equilibrium, (15 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Israel (0.29)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Game Theory (0.92)
Information Technology > Artificial Intelligence (0.68)

Add feedback