Goto

Collaborating Authors

 Langbort, Cedric


Online Decision Making with History-Average Dependent Costs (Extended)

arXiv.org Artificial Intelligence

In many online sequential decision-making scenarios, a learner's choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.


Pointwise-in-Time Explanation for Linear Temporal Logic Rules

arXiv.org Artificial Intelligence

The new field of Explainable Planning (XAIP) has produced a variety of approaches to explain and describe the behavior of autonomous agents to human observers. Many summarize agent behavior in terms of the constraints, or ''rules,'' which the agent adheres to during its trajectories. In this work, we narrow the focus from summary to specific moments in individual trajectories, offering a ''pointwise-in-time'' view. Our novel framework, which we define on Linear Temporal Logic (LTL) rules, assigns an intuitive status to any rule in order to describe the trajectory progress at individual time steps; here, a rule is classified as active, satisfied, inactive, or violated. Given a trajectory, a user may query for status of specific LTL rules at individual trajectory time steps. In this paper, we present this novel framework, named Rule Status Assessment (RSA), and provide an example of its implementation. We find that pointwise-in-time status assessment is useful as a post-hoc diagnostic, enabling a user to systematically track the agent's behavior with respect to a set of rules.


On Estimating Multi-Attribute Choice Preferences using Private Signals and Matrix Factorization

arXiv.org Machine Learning

Revealed preference theory studies the possibility of modeling an agent's revealed preferences and the construction of a consistent utility function. However, modeling agent's choices over preference orderings is not always practical and demands strong assumptions on human rationality and data-acquisition abilities. Therefore, we propose a simple generative choice model where agents are assumed to generate the choice probabilities based on latent factor matrices that capture their choice evaluation across multiple attributes. Since the multi-attribute evaluation is typically hidden within the agent's psyche, we consider a signaling mechanism where agents are provided with choice information through private signals, so that the agent's choices provide more insight about his/her latent evaluation across multiple attributes. We estimate the choice model via a novel multi-stage matrix factorization algorithm that minimizes the average deviation of the factor estimates from choice data. Simulation results are presented to validate the estimation performance of our proposed algorithm.