AITopics

arXiv.org Machine LearningJan-23-2019

Learning to Collaborate in Markov Decision Processes

Radanovic, Goran, Devidze, Rati, Parkes, David, Singla, Adish

We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting. We study the problem of designing a learning algorithm for the first agent (A1) that facilitates a successful collaboration even in cases when the second agent (A2) is adapting its policy in an unknown way. The key challenge in our setting is that the presence of the second agent leads to non-stationarity and non-obliviousness of rewards and transitions for the first agent. We design novel online learning algorithms for agent A1 whose regret decays as $O(T^{1-\frac{3}{7} \cdot \alpha})$ with $T$ learning episodes provided that the magnitude of agent A2's policy changes between any two consecutive episodes are upper bounded by $O(T^{-\alpha})$. Here, the parameter $\alpha$ is assumed to be strictly greater than $0$, and we show that this assumption is necessary provided that the {\em learning parity with noise} problem is computationally hard. We show that sub-linear regret of agent A1 further implies near-optimality of the agents' joint return for MDPs that manifest the properties of a {\em smooth} game.

agent, computer based training, educational technology, (18 more...)

arXiv.org Machine Learning

1901.08029

Genre: Research Report (0.50)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Enhancing the Accuracy and Fairness of Human Decision Making

Valera, Isabel, Singla, Adish, Rodriguez, Manuel Gomez

Societies often rely on human experts to take a wide variety of decisions affecting their members, from jail-or-release decisions taken by judges and stop-and-frisk decisions taken by police officers to accept-or-reject decisions taken by academics. In this context, each decision is taken by an expert who is typically chosen uniformly at random from a pool of experts. However, these decisions may be imperfect due to limited experience, implicit biases, or faulty probabilistic reasoning. Can we improve the accuracy and fairness of the overall decision making process by optimizing the assignment between experts and decisions? In this paper, we address the above problem from the perspective of sequential decision making and show that, for different fairness notions from the literature, it reduces to a sequence of (constrained) weighted bipartite matchings, which can be solved efficiently using algorithms with approximation guarantees. Moreover, these algorithms also benefit from posterior sampling to actively trade off exploitation---selecting expert assignments which lead to accurate and fair decisions---and exploration---selecting expert assignments to learn about the experts' preferences and biases. We demonstrate the effectiveness of our algorithms on both synthetic and real-world data and show that they can significantly improve both the accuracy and fairness of the decisions taken by pools of experts.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

Europe > Germany (0.46)
North America > United States (0.46)

Genre: Research Report (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Haug, Luis, Tschiatschek, Sebastian, Singla, Adish

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

artificial intelligence, ground transportation, learner, (21 more...)

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Germany (0.14)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners

Chen, Yuxin, Singla, Adish, Aodha, Oisin Mac, Perona, Pietro, Yue, Yisong

In real-world applications of education, an effective teacher adaptively chooses the next example to teach based on the learner's current state. However, most existing work in algorithmic machine teaching focuses on the batch setting, where adaptivity plays no role. In this paper, we study the case of teaching consistent, version space learners in an interactive setting. At any time step, the teacher provides an example, the learner performs an update, and the teacher observes the learner's new state. We highlight that adaptivity does not speed up the teaching process when considering existing models of version space learners, such as the "worst-case" model (the learner picks the next hypothesis randomly from the version space) and the "preference-based" model (the learner picks hypothesis according to some global preference). Inspired by human teaching, we propose a new model where the learner picks hypotheses according to some local preference defined by the current hypothesis. We show that our model exhibits several desirable properties, e.g., adaptivity plays a key role, and the learner's transitions over hypotheses are smooth/interpretable. We develop adaptive teaching algorithms, and demonstrate our results via simulation and user studies.

artificial intelligence, learner, machine learning, (17 more...)

Country: North America > Canada (0.14)

Genre:

Questionnaire & Opinion Survey (0.55)
Research Report > New Finding (0.48)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Haug, Luis, Tschiatschek, Sebastian, Singla, Adish

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

artificial intelligence, ground transportation, learner, (20 more...)

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Germany (0.14)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners

Chen, Yuxin, Singla, Adish, Aodha, Oisin Mac, Perona, Pietro, Yue, Yisong

In real-world applications of education, an effective teacher adaptively chooses the next example to teach based on the learner's current state. However, most existing work in algorithmic machine teaching focuses on the batch setting, where adaptivity plays no role. In this paper, we study the case of teaching consistent, version space learners in an interactive setting. At any time step, the teacher provides an example, the learner performs an update, and the teacher observes the learner's new state. We highlight that adaptivity does not speed up the teaching process when considering existing models of version space learners, such as the "worst-case" model (the learner picks the next hypothesis randomly from the version space) and the "preference-based" model (the learner picks hypothesis according to some global preference). Inspired by human teaching, we propose a new model where the learner picks hypotheses according to some local preference defined by the current hypothesis. We show that our model exhibits several desirable properties, e.g., adaptivity plays a key role, and the learner's transitions over hypotheses are smooth/interpretable. We develop adaptive teaching algorithms, and demonstrate our results via simulation and user studies.

artificial intelligence, learner, machine learning, (17 more...)

Country: North America > Canada (0.14)

Genre:

Questionnaire & Opinion Survey (0.55)
Research Report > New Finding (0.48)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Enhancing the Accuracy and Fairness of Human Decision Making

Valera, Isabel, Singla, Adish, Rodriguez, Manuel Gomez

Societies often rely on human experts to take a wide variety of decisions affecting their members, from jail-or-release decisions taken by judges and stop-and-frisk decisions taken by police officers to accept-or-reject decisions taken by academics. In this context, each decision is taken by an expert who is typically chosen uniformly at random from a pool of experts. However, these decisions may be imperfect due to limited experience, implicit biases, or faulty probabilistic reasoning. Can we improve the accuracy and fairness of the overall decision making process by optimizing the assignment between experts and decisions? In this paper, we address the above problem from the perspective of sequential decision making and show that, for different fairness notions from the literature, it reduces to a sequence of (constrained) weighted bipartite matchings, which can be solved efficiently using algorithms with approximation guarantees. Moreover, these algorithms also benefit from posterior sampling to actively trade off exploitation---selecting expert assignments which lead to accurate and fair decisions---and exploration---selecting expert assignments to learn about the experts' preferences and biases. We demonstrate the effectiveness of our algorithms on both synthetic and real-world data and show that they can significantly improve both the accuracy and fairness of the decisions taken by pools of experts.

assignment, law enforcement, public safety, (19 more...)

Country:

Europe > Germany (0.46)
North America > United States (0.46)

Genre: Research Report (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

arXiv.org Machine LearningDec-11-2018

Efficient learning of smooth probability functions from Bernoulli tests with guarantees

Rolland, Paul, Kavis, Ali, Singla, Adish, Cevher, Volkan

We study the fundamental problem of learning an unknown, smooth probability function via point-wise Bernoulli tests. We provide the first scalable algorithm for efficiently solving this problem with rigorous guarantees. In particular, we prove the convergence rate of our posterior update rule to the true probability function in L2-norm. Moreover, we allow the Bernoulli tests to depend on contextual features, and provide a modified inference engine with provable guarantees for this novel setting. Numerical results show that the empirical convergence rates match the theory, and illustrate the superiority of our approach in handling contextual features over the state-of-the-art.

artificial intelligence, health & medicine, probability function, (19 more...)

arXiv.org Machine Learning

1812.04428

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)