Goto

Collaborating Authors

 Education


AI Grand Challenges for Education

AI Magazine

This article focuses on contributions that AI can make to address long-term educational goals. It describes five challenges that would support: (1) mentors for every learner; (2) learning twenty-first century skills; (3) interaction data to support learning; (4) universal access to global classrooms; and (5) lifelong and life-wide learning. A vision and brief research agenda are described for each challenge along with goals that lead to access to global educational resources and the reuse and sharing of digital educational resources. Instructional systems with AI technology are described that currently support richer experiences for learners and supply researchers with new opportunities to analyze vast data sets of instructional behavior from big databases, containing elements of learning, affect, motivation, and social interaction. Personalized learning is described using computational tools that enhance student and group experience, reflection, and analysis, and supply data for development of novel theory development.


Virtual Humans for Learning

AI Magazine

Virtual humans are computer-generated characters designed to look and behave like real people. Studies have shown that virtual humans can mimic many of the social effects that one finds in human-human interactions such as creating rapport, and people respond to virtual humans in ways that are similar to how they respond to real people. We believe that virtual humans represent a new metaphor for interacting with computers, one in which working with a computer becomes much like interacting with a person and this can bring social elements to the interaction that are not easily supported with conventional interfaces. We present two systems that embody these ideas. The first, the Twins are virtual docents in the Museum of Science, Boston, designed to engage visitors and raise their awareness and knowledge of science. The second SimCoach, uses an empathetic virtual human to provide veterans and their families with information about PTSD and depression.


Serious Games Get Smart: Intelligent Game-Based Learning Environments

AI Magazine

Intelligent game-based learning environments integrate commercial game technologies with AI methods from intelligent tutoring systems and intelligent narrative technologies. This article introduces the CRYSTAL ISLAND intelligent game-based learning environment, which has been under development in the authors’ laboratory for the past seven years. After presenting CRYSTAL ISLAND, the principal technical problems of intelligent game-based learning environments are discussed: narrative-centered tutorial planning, student affect recognition, student knowledge modeling, and student goal recognition. Solutions to these problems are illustrated with research conducted with the CRYSTAL ISLAND learning environment.


Intelligent Learning Technologies Part 2: Applications of Artificial Intelligence to Contemporary and Emerging Educational Challenges

AI Magazine

Part Two of the special issue of AI Magazine presents articles on some of the most interesting projects at the intersection of AI and Education. Included are articles on integrated systems such as virtual humans, an intellgent textbook a game-based learning environment as well as technology focused components such as student models and data mining. The issue concludes with an article summarizing the contemporary and emerging challenges at the intersection of AI and education.


Online Learning with Switching Costs and Other Adaptive Adversaries

Neural Information Processing Systems

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $T^{2/3}$. Interestingly, this rate is significantly worse than the $\sqrt{T}$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force $T^{2/3}$ regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.


Online Learning with Switching Costs and Other Adaptive Adversaries

Neural Information Processing Systems

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $T^{2/3}$. Interestingly, this rate is significantly worse than the $\sqrt{T}$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force $T^{2/3}$ regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.


Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Neural Information Processing Systems

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer in online learning, notably in the multi-arm bandit framework, where the objective is to minimize the cumulative regret over a sequence of tasks by incrementally transferring knowledge from prior tasks. We introduce a novel bandit algorithm based on a method-of-moments approach for the estimation of the possible tasks and derive regret bounds for it.


Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

Neural Information Processing Systems

Approximate dynamic programming approaches to the reinforcement learning problem are often categorized into greedy value function methods and value-based policy gradient methods. As our first main result, we show that an important subset of the latter methodology is, in fact, a limiting special case of a general formulation of the former methodology; optimistic policy iteration encompasses not only most of the greedy value function methods but also natural actor-critic methods, and permits one to directly interpolate between them. The resulting continuum adjusts the strength of the Markov assumption in policy improvement and, as such, can be seen as dual in spirit to the continuum in TD($\lambda$)-style algorithms in policy evaluation. As our second main result, we show for a substantial subset of soft-greedy value function approaches that, while having the potential to avoid policy oscillation and policy chattering, this subset can never converge toward any optimal policy, except in a certain pathological case. Consequently, in the context of approximations, the majority of greedy value function methods seem to be deemed to suffer either from the risk of oscillation/chattering or from the presence of systematic sub-optimality.


Online Learning with Switching Costs and Other Adaptive Adversaries

Neural Information Processing Systems

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $T^{2/3}$. Interestingly, this rate is significantly worse than the $\sqrt{T}$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force $T^{2/3}$ regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.


Learning Trajectory Preferences for Manipulators via Iterative Improvement

Neural Information Processing Systems

We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We also formulate a score function to capture the contextual information and demonstrate the generalizability of our algorithm on a variety of household tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment.