AITopics

2401.14893

Country: North America > United States > Maryland (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.94)

arXiv.org Artificial IntelligenceJun-9-2023

A Unified Model and Dimension for Interactive Estimation

Brukhim, Nataly, Dudik, Miroslav, Pacchiano, Aldo, Schapire, Robert

We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its "similarity'' to points queried by the learner. We introduce a combinatorial measure called dissimilarity dimension which largely captures learnability in our model. We present a simple, general, and broadly-applicable algorithm, for which we obtain both regret and PAC generalization bounds that are polynomial in the new dimension. We show that our framework subsumes and thereby unifies two classic learning models: statistical-query learning and structured bandits. We also delineate how the dissimilarity dimension is related to well-known parameters for both frameworks, in some cases yielding significantly improved analyses.

artificial intelligence, dimension, machine learning, (17 more...)

2306.06184

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.94)

arXiv.org Artificial IntelligenceJun-9-2020

Constrained episodic reinforcement learning in concave-convex and knapsack settings

Brantley, Kianté, Dudik, Miroslav, Lykouris, Thodoris, Miryoosefi, Sobhan, Simchowitz, Max, Slivkins, Aleksandrs, Sun, Wen

Standard reinforcement learning (RL) approaches seek to maximize a scalar reward (Sutton and Barto, 1998, 2018; Schulman et al., 2015; Mnih et al., 2015), but in many settings this is insufficient, because the desired properties of the agent behavior are better described using constraints. For example, an autonomous vehicle should not only get to the destination, but should also respect safety, fuel efficiency, and human comfort constraints along the way (Le et al., 2019); a robot should not only fulfill its task, but should also control its wear and tear, for example, by limiting the torque exerted on its motors (Tessler et al., 2019). Moreover, in many settings, we wish to satisfy such constraints already during training and not only during the deployment. For example, a power grid, an autonomous vehicle, or a real robotic hardware should avoid costly failures, where the hardware is damaged or humans are harmed, already during training (Leike et al., 2017; Ray et al., 2020). Constraints are also key in additional sequential decision making applications, such as dynamic pricing with limited supply, e.g., (Besbes and Zeevi, 2009; Babaioff et al., 2015), scheduling of resources on a computer cluster (Mao et al., 2016), and imitation learning, where the goal is to stay close to an expert behavior (Syed and Schapire, 2007; Ziebart et al., 2008; Sun et al., 2019).

artificial intelligence, constraint, ground transportation, (19 more...)

2006.05051

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting (0.46)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-21-2019

Reinforcement Learning with Convex Constraints

Miryoosefi, Sobhan, Brantley, Kianté, Daumé, Hal III, Dudik, Miroslav, Schapire, Robert

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks, specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL algorithm. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms cannot incorporate, such as diversity.

artificial intelligence, constraint, reinforcement learning, (17 more...)

1906.09323

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningNov-11-2017

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Wang, Yu-Xiang, Agarwal, Alekh, Dudik, Miroslav

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) and doubly robust (DR) estimators. This highlights the difficulty of the agnostic contextual setting, in contrast with multi-armed bandits and contextual bandits with access to a consistent reward model, where IPS is suboptimal. We then propose the SWITCH estimator, which can use an existing reward model (not necessarily consistent) to achieve a better bias-variance tradeoff than IPS and DR. We prove an upper bound on its MSE and demonstrate its benefits empirically on a diverse collection of data sets, often outperforming prior work by orders of magnitude.

artificial intelligence, data mining, estimator, (16 more...)

1612.01205

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Machine LearningNov-4-2016

Contextual Semibandits via Supervised Learning Oracles

Krishnamurthy, Akshay, Agarwal, Alekh, Dudik, Miroslav

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits to supervised learning, allowing us to leverage powerful supervised learning methods in this partial-feedback setting. Our first reduction applies when the mapping from feedback to reward is known and leads to a computationally efficient algorithm with near-optimal regret. We show that this algorithm outperforms state-of-the-art approaches on real-world learning-to-rank datasets, demonstrating the advantage of oracle-based algorithms. Our second reduction applies to the previously unstudied setting when the linear mapping from feedback to reward is unknown. Our regret guarantees are superior to prior techniques that ignore the feedback.

artificial intelligence, inductive learning, probability, (21 more...)

1502.0589

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Machine LearningOct-30-2013

Para-active learning

Agarwal, Alekh, Bottou, Leon, Dudik, Miroslav, Langford, John

Training examples are not all equally informative. Active learning strategies leverage this observation in order to massively reduce the number of examples that need to be labeled. We leverage the same observation to build a generic strategy for parallelizing learning algorithms. This strategy is effective because the search for informative examples is highly parallelizable and because we show that its performance does not deteriorate when the sifting process relies on a slightly outdated model. Parallel active learning is particularly attractive to train nonlinear models with non-linear representations because there are few practical parallel learning algorithms for such models. We report preliminary experiments using both kernel SVMs and SGD-trained neural networks.

active learning, inductive learning, neural network, (20 more...)

1310.8243

Country: North America > United States > New York (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

arXiv.org Machine LearningJul-11-2013

A Reliable Effective Terascale Linear Learning System

Agarwal, Alekh, Chapelle, Olivier, Dudik, Miroslav, Langford, John

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature (as of 2011 when our experiments were conducted). We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.

algorithm, inductive learning, optimization problem, (18 more...)

1110.4198

Country: North America > United States > California (0.28)

Genre: Research Report (0.83)

Industry: Marketing (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.66)

arXiv.org Machine LearningOct-16-2012

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

Dudik, Miroslav, Erhan, Dumitru, Langford, John, Li, Lihong

We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control of a bias-variance tradeoff, and further decreases variance by incorporating information about randomness of the target policy. Empirical evidence from synthetic and realworld exploration learning problems shows the new evaluator successfully unifies previous approaches and uses information an order of magnitude more efficiently.

artificial intelligence, evaluator, machine learning, (15 more...)

1210.4862

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceMay-9-2012

First-Order Mixed Integer Linear Programming

Gordon, Geoffrey, Hong, Sue Ann, Dudik, Miroslav

Mixed integer linear programming (MILP) is a powerful representation often used to formulate decision-making problems under uncertainty. However, it lacks a natural mechanism to reason about objects, classes of objects, and relations. First-order logic (FOL), on the other hand, excels at reasoning about classes of objects, but lacks a rich representation of uncertainty. While representing propositional logic in MILP has been extensively explored, no theory exists yet for fully combining FOL with MILP. We propose a new representation, called first-order programming or FOP, which subsumes both FOL and MILP. We establish formal methods for reasoning about first order programs, including a sound and complete lifted inference procedure for integer first order programs. Since FOP can offer exponential savings in representation and proof size compared to FOL, and since representations and proofs are never significantly longer in FOP than in FOL, we anticipate that inference in FOP will be more tractable than inference in FOL for corresponding problems.

logic programming, optimization problem, procedure, (19 more...)

1205.2644

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)