Goto

Collaborating Authors

 Undirected Networks


Yin

AAAI Conferences

Recent work has shown that the quality of work produced in a crowdsourcing working session can be influenced by the presence of performance-contingent financial incentives, such as bonuses for exceptional performance, in the session. We take an algorithmic approach to decide when to offer bonuses in a working session to improve the overall utility that a requester derives from the session. Specifically, we propose and train an input-output hidden Markov model to learn the impact of bonuses on work quality and then use this model to dynamically decide whether to offer a bonus on each task in a working session to maximize a requester's utility. Experiments on Amazon Mechanical Turk show that our approach leads to higher utility for the requester than fixed and random bonus schemes do. Simulations on synthesized data sets further demonstrate the robustness of our approach against different worker population and worker behavior in improving requester utility.


Dressel

AAAI Conferences

Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP -- a state-of-the-art offline solver -- to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of information-gathering tasks. These solvers are applied to control a hexcopter UAV searching for a radio frequency source--a challenging real-world problem.


Wang

AAAI Conferences

Making principled decisions in the presence of uncertainty is often facilitated by Partially Observable Markov Decision Processes (POMDPs). Despite tremendous advances in POMDP solvers, finding good policies with large action spaces remains difficult. To alleviate this difficulty, this paper presents an on-line approximate solver, called Quantile-Based Action Selector (QBASE). It uses quantile-statistics to adaptively evaluate a small subset of the action space without sacrificing the quality of the generated decision strategies by much. Experiments on four different robotics tasks with up to 10,000 actions indicate that QBASE can generate substantially better strategies than a state-of-the-art method.


Sunberg

AAAI Conferences

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.


Roijers

AAAI Conferences

Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.


Chatterjee

AAAI Conferences

Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize "weakest" additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability 1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.


Wilhelm

AAAI Conferences

The principle of maximum entropy (MaxEnt) provides a well-founded methodology for commonsense reasoning based on probabilistic conditional knowledge. We show how to calculate MaxEnt distributions in a first-order setting by using typed model counting and condensed iterative scaling. Further, we discuss the connection to Markov Logic Networks for drawing inferences.


Eskandanian

AAAI Conferences

In a variety of online settings involving interaction with end-users it is critical for the systems to adapt to changes in user preference. User preferences on items tend to change over time due to a variety of factors such as change in context, the task being performed, or other short-term or long-term external factors. Recommender systems, in particular need to be able to capture these dynamics in user preferences in order to remain tuned to the most current interests of users. In this work we present a recommendation framework which takes into account the dynamics of user preferences. We propose an approach based on Hidden Markov Models (HMM) to identify change-points in the sequence of user interactions which reflect significant changes in preference according to the sequential behavior of all the users in the data. The proposed framework leverages the identified change points to generate recommendations using a sequence-aware non-negative matrix factorization model. We empirically demonstrate the effectiveness of the HMM-based change detection method as compared to standard baseline methods. Additionally, we evaluate the performance of the proposed recommendation method and show that it compares favorably to state-of-the-art sequence-aware recommendation models.


Hartwig

AAAI Conferences

Probabilistic graphical models have been successfully applied in a lot of different fields, e.g., medical diagnosis and bio-statistics. Multiple specific extensions have been developed to handle, e.g., time-series data or Gaussian distributed random variables. In the case that handles both Gaussian variables and time-series data, downsides are that the models still have a discrete time-scale, evidence needs to be propagated through the graph and the conditional relationships between the variables are bound to be linear. This paper converts two probabilistic graphical models (the Markov chain and the hidden Markov model) into Gaussian processes by constructing covariance and mean functions, that encode the characteristics of the probabilistic graphical models. Our developed Gaussian process based formalism has the advantage of supporting a continuous time scale, direct inference from any time point to the other without propagation of evidence and flexibility to modify the covariance function if needed.


Zhang

AAAI Conferences

Solving Partially Observable Markov Decision Processes (POMDPs) generally is computationally intractable. In this paper, we study a special POMDP class, namely informative POMDPs, where each observation provides good albeit incomplete information about world states. We propose two ways to accelerate value iteration algorithm for such POMDPs. First, dynamic programming (DP) updates can be carried out over a relatively small subset of belief space. Conducting DP updates over subspace leads to two advantages: representational savings in space and computational savings in time. Second, a point-based procedure is used to cut down the number of iterations for value iteration over subspace to converge. Empirical studies are presented to demonstrate various computational gains.