Goto

Collaborating Authors

 disutility


Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

arXiv.org Artificial Intelligence

The deployment of decision-making AI agents presents a critical challenge in maintaining alignment with human values or guidelines while operating in complex, dynamic environments. Agents trained solely to achieve their objectives may adopt harmful behavior, exposing a key trade-off between maximizing the reward function and maintaining alignment. For pre-trained agents, ensuring alignment is particularly challenging, as retraining can be a costly and slow process. This is further complicated by the diverse and potentially conflicting attributes representing the ethical values for alignment. To address these challenges, we propose a test-time alignment technique based on model-guided policy shaping. Our method allows precise control over individual behavioral attributes, generalizes across diverse reinforcement learning (RL) environments, and facilitates a principled trade-off between ethical alignment and reward maximization without requiring agent retraining. We evaluate our approach using the MACHIAVELLI benchmark, which comprises 134 text-based game environments and thousands of annotated scenarios involving ethical decisions. The RL agents are first trained to maximize the reward in their respective games. At test time, we apply policy shaping via scenario-action attribute classifiers to ensure decision alignment with ethical attributes. We compare our approach against prior training-time methods and general-purpose agents, as well as study several types of ethical violations and power-seeking behavior. Our results demonstrate that test-time policy shaping provides an effective and scalable solution for mitigating unethical behavior across diverse environments and alignment attributes.


Market share maximizing strategies of CAV fleet operators may cause chaos in our cities

arXiv.org Artificial Intelligence

We study the dynamics and equilibria of a new kind of routing games, where players - drivers of future autonomous vehicles - may switch between individual (HDV) and collective (CAV) routing. In individual routing, just like today, drivers select routes minimizing expected travel costs, whereas in collective routing an operator centrally assigns vehicles to routes. The utility is then the average experienced travel time discounted with individually perceived attractiveness of automated driving. The market share maximising strategy amounts to offering utility greater than for individual routing to as many drivers as possible. Our theoretical contribution consists in developing a rigorous mathematical framework of individualized collective routing and studying algorithms which fleets of CAVs may use for their market-share optimization. We also define bi-level CAV - HDV equilibria and derive conditions which link the potential marketing behaviour of CAVs to the behavioural profile of the human population. Practically, we find that the fleet operator may often be able to equilibrate at full market share by simply mimicking the choices HDVs would make. In more realistic heterogenous human population settings, however, we discover that the market-share maximizing fleet controller should use highly variable mixed strategies as a means to attract or retain customers. The reason is that in mixed routing the powerful group player can control which vehicles are routed via congested and uncongested alternatives. The congestion pattern generated by CAVs is, however, not known to HDVs before departure and so HDVs cannot select faster routes and face huge uncertainty whichever alternative they choose. Consequently, mixed market-share maximising fleet strategies resulting in unpredictable day-to-day driving conditions may, alarmingly, become pervasive in our future cities.


Welfare-Centric Clustering

arXiv.org Artificial Intelligence

Fair clustering has traditionally focused on ensuring equitable group representation or equalizing group-specific clustering costs. However, Dickerson et al. (2025) recently showed that these fairness notions may yield undesirable or unintuitive clustering outcomes and advocated for a welfare-centric clustering approach that models the utilities of the groups. In this work, we model group utilities based on both distances and proportional representation and formalize two optimization objectives based on welfare-centric clustering: the Rawlsian (Egalitarian) objective and the Utilitarian objective. We introduce novel algorithms for both objectives and prove theoretical guarantees for them. Empirical evaluations on multiple real-world datasets demonstrate that our methods significantly outperform existing fair clustering baselines.


Not in My Backyard! Temporal Voting Over Public Chores

arXiv.org Artificial Intelligence

We study a temporal voting model where voters have dynamic preferences over a set of public chores -- projects that benefit society, but impose individual costs on those affected by their implementation. We investigate the computational complexity of optimizing utilitarian and egalitarian welfare. Our results show that while optimizing the former is computationally straightforward, minimizing the latter is computationally intractable, even in very restricted cases. Nevertheless, we identify several settings where this problem can be solved efficiently, either exactly or by an approximation algorithm. We also examine the effects of enforcing temporal fairness and its impact on social welfare, and analyze the competitive ratio of online algorithms. We then explore the strategic behavior of agents, providing insights into potential malfeasance in such decision-making environments. Finally, we discuss a range of fairness measures and their suitability for our setting.


Participatory Budgeting With Multiple Degrees of Projects And Ranged Approval Votes

arXiv.org Artificial Intelligence

In an indivisible participatory budgeting (PB) framework, we have a limited budget that is to be distributed among a set of projects, by aggregating the preferences of voters for the projects. All the prior work on indivisible PB assumes that each project has only one possible cost. In this work, we let each project have a set of permissible costs, each reflecting a possible degree of sophistication of the project. Each voter approves a range of costs for each project, by giving an upper and lower bound on the cost that she thinks the project deserves. The outcome of a PB rule selects a subset of projects and also specifies their corresponding costs. We study different utility notions and prove that the existing positive results when every project has exactly one permissible cost can also be extended to our framework where a project has several permissible costs. We also analyze the fixed parameter tractability of the problem. Finally, we propose some important and intuitive axioms and analyze their satisfiability by different PB rules. We conclude by making some crucial remarks.


A multi-objective constrained POMDP model for breast cancer screening

arXiv.org Artificial Intelligence

Breast cancer is a common and deadly disease, but it is often curable when diagnosed early. While most countries have large-scale screening programs, there is no consensus on a single globally accepted guideline for breast cancer screening. The complex nature of the disease; the limited availability of screening methods such as mammography, magnetic resonance imaging (MRI), and ultrasound; and public health policies all factor into the development of screening policies. Resource availability concerns necessitate the design of policies which conform to a budget, a problem which can be modelled as a constrained partially observable Markov decision process (CPOMDP). In this study, we propose a multi-objective CPOMDP model for breast cancer screening which allows for supplemental screening methods to accompany mammography. The model has two objectives: maximize the quality-adjusted life years (QALYs) and minimize lifetime breast cancer mortality risk (LBCMR). We identify the Pareto frontier of optimal solutions for average and high-risk patients at different budget levels, which can be used by decision-makers to set policies in practice. We find that the policies obtained by using a weighted objective are able to generate well-balanced QALYs and LBCMR values. In contrast, the single-objective models generally sacrifice a substantial amount in terms of QALYs/LBCMR for a minimal gain in LBCMR/QALYs. Additionally, our results show that, with the baseline cost values for supplemental screenings as well as the additional disutility that they incur, they are rarely recommended in CPOMDP policies, especially in a budget-constrained setting. A sensitivity analysis reveals the thresholds on cost and disutility values at which supplemental screenings become advantageous to prescribe.


Indivisible Participatory Budgeting under Weak Rankings

arXiv.org Artificial Intelligence

Participatory budgeting (PB) has attracted much attention in recent times due to its wide applicability in social choice settings. In this paper, we consider indivisible PB which involves allocating an available, limited budget to a set of indivisible projects, each having a certain cost, based on the preferences of agents over projects. The specific, important, research gap that we address in this paper is to propose classes of rules for indivisible PB with weak rankings (i.e., weak ordinal preferences) and investigate their key algorithmic and axiomatic issues. We propose two classes of rules having distinct significance and motivation. The first is layered approval rules which enable weak rankings to be studied by carefully translating them into approval votes. The second is need-based rules which enable to capture fairness issues. Under layered approval rules, we study two natural families of rules: greedy-truncation rules and cost-worthy rules. The paper has two parts. In the first part, we investigate algorithmic and complexity related issues for the proposed rules. In the second part, we present a detailed axiomatic analysis of these rules, for which, we examine and generalize axioms in the literature and also introduce a new axiom, pro-affordability. The paper helps to highlight the trade-offs among practical appeal, computational complexity, and axiomatic compliance of these rules.


Chore division on a graph

arXiv.org Artificial Intelligence

The paper considers fair allocation of indivisible nondisposable items that generate disutility (chores). We assume that these items are placed in the vertices of a graph and each agent's share has to form a connected subgraph of this graph. Although a similar model has been investigated before for goods, we show that the goods and chores settings are inherently different. In particular, it is impossible to derive the solution of the chores instance from the solution of its naturally associated fair division instance. We consider three common fair division solution concepts, namely proportionality, envy-freeness and equitability, and two individual disutility aggregation functions: additive and maximum based. We show that deciding the existence of a fair allocation is hard even if the underlying graph is a path or a star. We also present some efficiently solvable special cases for these graph topologies.