AITopics | Petrik, Marek

Collaborating Authors

Petrik, Marek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Value Function Approximation Using Bilinear Programming

Petrik, Marek, Zilberstein, Shlomo

Neural Information Processing SystemsFeb-15-2020, 03:11:09 GMT

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, it provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP hard, but this is unavoidable because the Bellman-residual minimization itself is NP hard. We, therefore, employ and analyze a common approximate algorithm for bilinear programs.

artificial intelligence, function approximation, fuzzy logic, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.94)

Add feedback

High-Confidence Policy Optimization: Reshaping Ambiguity Sets in Robust MDPs

Behzadian, Bahram, Russel, Reazul Hasan, Petrik, Marek

arXiv.org Artificial IntelligenceOct-25-2019

Robust MDPs are a promising framework for computing robust policies in reinforcement learning. Ambiguity sets, which represent the plausible errors in transition probabilities, determine the trade-off between robustness and average-case performance. The standard practice of defining ambiguity sets using the $L_1$ norm leads, unfortunately, to loose and impractical guarantees. This paper describes new methods for optimizing the shape of ambiguity sets beyond the $L_1$ norm. We derive new high-confidence sampling bounds for weighted $L_1$ and weighted $L_\infty$ ambiguity sets and describe how to compute near-optimal weights from rough value function estimates. Experimental results on a diverse set of benchmarks show that optimized ambiguity sets provide significantly tighter robustness guarantees.

ambiguity, artificial intelligence, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

1910.10786

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Robust Exploration with Tight Bayesian Plausibility Sets

Russel, Reazul H., Gu, Tianyi, Petrik, Marek

arXiv.org Artificial IntelligenceApr-17-2019

Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms. We propose optimism in the face of sensible value functions (OFVF)- a novel data-driven Bayesian algorithm to constructing Plausibility sets for MDPs to explore robustly minimizing the worst case exploration cost. The method computes policies with tighter optimistic estimates for exploration by introducing two new ideas. First, it is based on Bayesian posterior distributions rather than distribution-free bounds. Second, OFVF does not construct plausibility sets as simple confidence intervals. Confidence intervals as plausibility sets are a sufficient but not a necessary condition. OFVF uses the structure of the value function to optimize the location and shape of the plausibility set to guarantee upper bounds directly without necessarily enforcing the requirement for the set to be a confidence interval. OFVF proceeds in an episodic manner, where the duration of the episode is fixed and known. Our algorithm is inherently Bayesian and can leverage prior information. Our theoretical analysis shows the robustness of OFVF, and the empirical results demonstrate its practical promise.

artificial intelligence, bayesian inference, plausibility, (17 more...)

arXiv.org Artificial Intelligence

1904.08528

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Petrik, Marek, Russell, Reazul Hasan

arXiv.org Machine LearningFeb-20-2019

Robust MDPs (RMDPs) can be used to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution are determined by the ambiguity set---the set of plausible transition probabilities---which is usually constructed as a multi-dimensional confidence region. Existing methods construct ambiguity sets as confidence regions using concentration inequalities which leads to overly conservative solutions. This paper proposes a new paradigm that can achieve better solutions with the same robustness guarantees without using confidence regions as ambiguity sets. To incorporate prior knowledge, our algorithms optimize the size and position of ambiguity sets using Bayesian inference. Our theoretical analysis shows the safety of the proposed method, and the empirical results demonstrate its practical promise.

ambiguity, bayesian inference, health & medicine, (15 more...)

arXiv.org Machine Learning

1902.07605

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Tirinzoni, Andrea, Petrik, Marek, Chen, Xiangli, Ziebart, Brian

Neural Information Processing SystemsDec-31-2018

What policy should be employed in a Markov decision process with uncertain parameters? Robust optimization answer to this question is to use rectangular uncertainty sets, which independently reflect available knowledge about each state, and then obtains a decision policy that maximizes expected reward for the worst-case decision process parameters from these uncertainty sets. While this rectangularity is convenient computationally and leads to tractable solutions, it often produces policies that are too conservative in practice, and does not facilitate knowledge transfer between portions of the state space or across related decision processes. In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally.

constraint, decision support system, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
(2 more...)

Add feedback

Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Tirinzoni, Andrea, Petrik, Marek, Chen, Xiangli, Ziebart, Brian

Neural Information Processing SystemsDec-31-2018

What policy should be employed in a Markov decision process with uncertain parameters? Robust optimization's answer to this question is to use rectangular uncertainty sets, which independently reflect available knowledge about each state, and then to obtain a decision policy that maximizes the expected reward for the worst-case decision process parameters from these uncertainty sets. While this rectangularity is convenient computationally and leads to tractable solutions, it often produces policies that are too conservative in practice, and does not facilitate knowledge transfer between portions of the state space or across related decision processes. In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally.

artificial intelligence, constraint, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.14)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
(2 more...)

Add feedback

Tight Bayesian Ambiguity Sets for Robust MDPs

Russel, Reazul Hasan, Petrik, Marek

arXiv.org Machine LearningNov-15-2018

Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters. We address the problem of using robust MDPs (RMDPs) to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution is determined by its ambiguity set. Existing methods construct ambiguity sets that lead to impractically conservative solutions. In this paper, we propose RSVF, which achieves less conservative solutions with the same worst-case guarantees by 1) leveraging a Bayesian prior, 2) optimizing the size and location of the ambiguity set, and, most importantly, 3) relaxing the requirement that the set is a confidence interval. Our theoretical analysis shows the safety of RSVF, and the empirical results demonstrate its practical promise.

ambiguity, artificial intelligence, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1811.06512

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Liu, Bo, Gemp, Ian, Ghavamzadeh, Mohammad, Liu, Ji, Mahadevan, Sridhar, Petrik, Marek

Journal of Artificial Intelligence ResearchNov-15-2018

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal "mirror maps" to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

electrical industrial apparatus, machine learning, reinforcement learning, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11251

AI Access Foundation

11260

Journal of Artificial Intelligence Research

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Interpretable Reinforcement Learning with Ensemble Methods

Brown, Alexander, Petrik, Marek

arXiv.org Machine LearningSep-18-2018

While the performance of such systems is impressive and very useful, sometimes it is desirable to understand and interpret the actions of a reinforcement learning system, and machine learning systems in general. These circumstances are more common in high-pressure applications, such as healthcare, targeted advertising, or finance [6]. For example, researchers at the University of Pittsburgh Medical Center trained a variety of machine learning models including neural networks and decision trees to predict whether pneumonia patients might develop severe complications. The neural networks performed the best on their testing data, but upon examination of the rules of the decision trees, the researchers found that the trees recommended sending pneumonia patients who had asthma directly home, despite the fact that asthma makes patients with pneumonia much more likely to suffer complications. Through further investigation they discovered the rule represented a trend in their data: the hospital had a policy to automatically send pneumonia patients with asthma to intensive care, and because this policy was so effective, those patients almost never developed complications.

artificial intelligence, health & medicine, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1809.06995

Country:

North America > United States > New York (0.15)
North America > United States > New Jersey (0.14)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Health Care Providers & Services (0.88)
Leisure & Entertainment > Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

Elmachtoub, Adam N., McNellis, Ryan, Oh, Sechan, Petrik, Marek

arXiv.org Machine LearningJun-14-2017

Many efficient algorithms with strong theoretical guarantees have been proposed for the contextual multi-armed bandit problem. However, applying these algorithms in practice can be difficult because they require domain expertise to build appropriate features and to tune their parameters. We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with little or no domain expertise. Our algorithm relies on decision trees to model the context-reward relationship. Decision trees are non-parametric, interpretable, and work well without hand-crafted features. To guide the exploration-exploitation trade-off, we use a bootstrapping approach which abstracts Thompson sampling to non-Bayesian settings. We also discuss several computational heuristics and demonstrate the performance of our method on several datasets.

algorithm, decision tree learning, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1706.04687

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports (0.67)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback