Goto

Collaborating Authors

 boutilier



Decision-Aware Predictive Model Selection for Workforce Allocation

Stratman, Eric G., Boutilier, Justin J., Albert, Laura A.

arXiv.org Artificial Intelligence

Many organizations depend on human decision-makers to make subjective decisions, especially in settings where information is scarce. Although workers are often viewed as interchangeable, the specific individual assigned to a task can significantly impact outcomes due to their unique decision-making processes and risk tolerance. In this paper, we introduce a novel framework that utilizes machine learning to predict worker behavior and employs integer optimization to strategically assign workers to tasks. Unlike traditional methods that treat machine learning predictions as static inputs for optimization, in our approach, the optimal predictive model used to represent a worker's behavior is determined by how that worker is allocated within the optimization process. We present a decision-aware optimization framework that integrates predictive model selection with worker allocation. Collaborating with an auto-insurance provider and using real-world data, we evaluate the effectiveness of our proposed method by applying three different techniques to predict worker behavior. Our findings show the proposed decision-aware framework outperforms traditional methods and offers context-sensitive and data-responsive strategies for workforce management.


Leveraging Statistical Multi-Agent Online Planning with Emergent Value Function Approximation

Phan, Thomy, Belzner, Lenz, Gabor, Thomas, Schmid, Kyrill

arXiv.org Artificial Intelligence

Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions. However, planning often has to be performed under strict computational constraints making online planning in multi-agent systems highly limited, which could lead to poor system performance, especially in stochastic domains. In this paper, we propose Emergent Value function Approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to consider global effects during local planning. For this purpose, a value function is approximated online based on the emergent system behaviour by using methods of reinforcement learning. We empirically evaluated EVADE with two statistical multi-agent online planning algorithms in a highly complex and stochastic smart factory environment, where multiple agents need to process various items at a shared set of machines. Our experiments show that EVADE can effectively improve the performance of multi-agent online planning while offering efficiency w.r.t. the breadth and depth of the planning process.



Gradient-based Optimization for Bayesian Preference Elicitation

Vendrov, Ivan, Lu, Tyler, Huang, Qingqing, Boutilier, Craig

arXiv.org Artificial Intelligence

Effective techniques for eliciting user preferences have taken on added importance as recommender systems (RSs) become increasingly interactive and conversational. A common and conceptually appealing Bayesian criterion for selecting queries is expected value of information (EVOI) . Unfortunately, it is computationally prohibitive to construct queries with maximum EVOI in RSs with large item spaces. We tackle this issue by introducing a continuous formulation of EVOI as a differentiable network that can be optimized using gradient methods available in modern machine learning (ML) computational frameworks (e.g., TensorFlow, PyTorch). We exploit this to develop a novel, scalable Monte Carlo method for EVOI optimization, which is more scalable for large item spaces than methods requiring explicit enumeration of items. While we emphasize the use of this approach for pairwise (or k -wise) comparisons of items, we also demonstrate how our method can be adapted to queries involving subsets of item attributes or "partial items," which are often more cognitively manageable for users. Experiments show that our gradient-based EVOI technique achieves state-of-the-art performance across several domains while scaling to large item spaces.


Focus on new faculty: Boutilier bolsters global health through optimization - College of Engineering - University of Wisconsin-Madison

#artificialintelligence

Justin Boutilier uses optimization and machine learning to improve healthcare access, delivery and quality, particularly in low- and middle-income settings. As a second-year PhD student at the University of Toronto, Justin Boutilier spent four weeks in Dhaka, Bangladesh, investigating ways to curb ambulance response times in the bustling capital of a developing country. He quickly got a firsthand look at the scope of the challenge: The roughly 10-mile trip from his hotel to meetings in the city took about three hours. "You could walk faster," he says, "but there's no sidewalk, so it's kind of dangerous." Boutilier, who has joined the Department of Industrial and Systems Engineering at the University of Wisconsin-Madison as an assistant professor, uses optimization and machine learning to improve healthcare access, delivery and quality, particularly in low- and middle-income settings.


A General Interactive Approach for Solving Multi-Objective Combinatorial Optimization Problems with Imprecise Preferences

Benabbou, Nawal (LIP6) | Lust, Thibaut (Sorbonne University)

AAAI Conferences

In this paper, we develop a general interactive method to solve multi-objective combinatorial optimization problems with imprecise preferences. Assuming that preferences can be represented by a parameterized scalarizing function, we iteratively ask preferences queries to the decision maker in order to reduce the uncertainty over the preference parameters until being able to determine her preferred solution. To produce informative preference queries at each step, we generate promising solutions using the extreme points of the polyhedron representing the admissible preference parameters and then we ask the decision maker to compare two of these solutions (we propose different selection strategies). These extreme points are also used to provide a stopping criterion guaranteeing that the returned solution is optimal (or near-optimal) according to the decision maker's preferences. For the multi-objective spanning tree problem with a linear aggregation function, we provide numerical results to demonstrate the practical efficiency of our approach and we compare our results to a recent approach based on minimax regret, where preferences are asked during the construction of a solution. We show that better results are achieved by our method both in terms of running time and number of questions.


Reinforcement Learning for LTLf/LDLf Goals

De Giacomo, Giuseppe, Iocchi, Luca, Favorito, Marco, Patrizi, Fabio

arXiv.org Artificial Intelligence

MDPs extended with LTLf/LDLf non-Markovian rewards have recently attracted interest as a way to specify rewards declaratively. In this paper, we discuss how a reinforcement learning agent can learn policies fulfilling LTLf/LDLf goals. In particular we focus on the case where we have two separate representations of the world: one for the agent, using the (predefined, possibly low-level) features available to it, and one for the goal, expressed in terms of high-level (human-understandable) fluents. We formally define the problem and show how it can be solved. Moreover, we provide experimental evidence that keeping the RL agent feature space separated from the goal's can work in practice, showing interesting cases where the agent can indeed learn a policy that fulfills the LTLf/LDLf goal using only its features (augmented with additional memory).


An Experimental Study of Advice in Sequential Decision-Making Under Uncertainty

Benavent, Florian (Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen) | Zanuttini, Bruno (Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen)

AAAI Conferences

We consider sequential decision making problems under uncertainty, in which a user has a general idea of the task to achieve, and gives advice to an agent in charge of computing an optimal policy. Many different notions of advice have been proposed in somewhat different settings, especially in the field of inverse reinforcement learning and for resolution of Markov Decision Problems with Imprecise Rewards. Two key questions are whether the advice required by a specific method is natural for the user to give, and how much advice is needed for the agent to compute a good policy, as evaluated by the user. We give a unified view of a number of proposals made in the literature, and propose a new notion of advice, which corresponds to a user telling why she would take a given action in a given state. For all these notions, we discuss their naturalness for a user and the integration of advice. We then report on an experimental study of the amount of advice needed for the agent to compute a good policy. Our study shows in particular that continual interaction between the user and the agent is worthwhile, and sheds light on the pros and cons of each type of advice.


LTLf/LDLf Non-Markovian Rewards

Brafman, Ronen I. (Ben-Gurion University) | Giacomo, Giuseppe De (Sapienza University of Rome) | Patrizi, Fabio (Sapienza University of Rome)

AAAI Conferences

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.