AITopics

Country: North America > Canada > Ontario > Toronto (0.14)

Industry: Government > Voting & Elections (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-12-2026, 23:01:01 GMT

f978c8f3b5f399cae464e85f72e28503-Paper-Conference.pdf

agreement, consensus statement, participant, (14 more...)

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Neural Information Processing SystemsOct-10-2025, 01:23:50 GMT

499b12df1531fe8ee0febcf08381f3a4-Paper-Conference.pdf

algorithm, complexity, log 2, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Government (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Neural Information Processing SystemsAug-22-2025, 02:07:36 GMT

Fine-tuning language models to find agreement among humans with diverse preferences

Further, our best model's consensus statements are preferred

large language model, machine learning, natural language, (18 more...)

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Rosenfeld, Nir, Xu, Haifeng

Machine Learning Should Maximize Welfare, Not (Only) Accuracy

arXiv.org Artificial IntelligenceFeb-17-2025

Decades of research in machine learning have given us powerful tools for making accurate predictions. But when used in social settings and on human inputs, better accuracy does not immediately translate to better social outcomes. This may not be surprising given that conventional learning frameworks are not designed to express societal preferences -- let alone promote them. This position paper argues that machine learning is currently missing, and can gain much from incorporating, a proper notion of social welfare. The field of welfare economics asks: how should we allocate limited resources to self-interested agents in a way that maximizes social benefit? We argue that this perspective applies to many modern applications of machine learning in social contexts, and advocate for its adoption. Rather than disposing of prediction, we aim to leverage this forte of machine learning for promoting social welfare. We demonstrate this idea by proposing a conceptual framework that gradually transitions from accuracy maximization (with awareness to welfare) to welfare maximization (via accurate prediction). We detail applications and use-cases for which our framework can be effective, identify technical challenges and practical opportunities, and highlight future avenues worth pursuing.

artificial intelligence, machine learning, prediction, (13 more...)

2502.11981

Country: North America > United States (0.93)

Genre: Research Report (0.84)

Industry:

Banking & Finance (0.93)
Law (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceFeb-13-2025

Navigating the Social Welfare Frontier: Portfolios for Multi-objective Reinforcement Learning

Kim, Cheol Woo, Moondra, Jai, Verma, Shresth, Pollack, Madeleine, Kong, Lingkai, Tambe, Milind, Gupta, Swati

In this paper, we study a reinforcement learning (RL) setting where a deployed policy impacts multiple stakeholders in different ways. Each stakeholder is associated with a unique reward function, and the goal is to train a policy that adequately aggregates their preferences. This setting, which is often modeled using multi-objective reinforcement learning (MORL), arises in many RL applications, such as fair resource allocation in healthcare [32], cloud computing [27, 18] and communication networks [36, 7]. Recently, with the rise of large language models (LLMs), reinforcement learning from human feedback (RLHF) techniques that reflect the preferences of heterogeneous individuals have also been explored [6, 38, 26]. Preference aggregation in such scenarios is often achieved by choosing a social welfare function, which takes the utilities of multiple stakeholders as input and outputs a scalar value representing the overall welfare [37, 9, 32, 13, 38, 26, 6]. However, selecting the appropriate social welfare function is a nontrivial task, as each function embodies a different notion of social welfare and can lead to vastly different outcomes for the involved stakeholders. In this work, we focus on a class of social welfare functions known as generalized p-means, a widely used class of social welfare functions in algorithmic fairness and social choice theory.

machine learning, portfolio, reinforcement learning, (18 more...)

2502.09724

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Transportation > Passenger (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Sharma, Vibhhu, Wilder, Bryan

Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources

arXiv.org Machine LearningNov-11-2024

Machine learning is increasingly used to select which individuals receive limited-resource interventions in domains such as human services, education, development, and more. However, it is often not apparent what the right quantity is for models to predict. In particular, policymakers rarely have access to data from a randomized controlled trial (RCT) that would enable accurate estimates of treatment effects -- which individuals would benefit more from the intervention. Observational data is more likely to be available, creating a substantial risk of bias in treatment effect estimates. Practitioners instead commonly use a technique termed "risk-based targeting" where the model is just used to predict each individual's status quo outcome (an easier, non-causal task). Those with higher predicted risk are offered treatment. There is currently almost no empirical evidence to inform which choices lead to the most effect machine learning-informed targeting strategies in social domains. In this work, we use data from 5 real-world RCTs in a variety of domains to empirically assess such choices. We find that risk-based targeting is almost always inferior to targeting based on even biased estimates of treatment effects. Moreover, these results hold even when the policymaker has strong normative preferences for assisting higher-risk individuals. Our results imply that, despite the widespread use of risk prediction models in applied settings, practitioners may be better off incorporating even weak evidence about heterogeneous causal effects to inform targeting.

baseline risk, dataset, treatment effect, (15 more...)

arXiv.org Machine Learning

2411.07414

Country:

North America > United States > Tennessee (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > India (0.04)
(3 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Government (1.00)
Education > Educational Setting (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Verma, Shresth, Boehmer, Niclas, Kong, Lingkai, Tambe, Milind

Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards

arXiv.org Artificial IntelligenceSep-15-2024

LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.

agent, objective, reward function, (16 more...)

2408.12112

Country:

Asia > India (0.04)
Asia > Singapore (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Pardeshi, Kanad Shrikar, Shapira, Itai, Procaccia, Ariel D., Singh, Aarti

Learning Social Welfare Functions

arXiv.org Artificial IntelligenceMay-27-2024

Is it possible to understand or imitate a policy maker's rationale by looking at past decisions they made? We formalize this question as the problem of learning social welfare functions belonging to the well-studied family of power mean functions. We focus on two learning tasks; in the first, the input is vectors of utilities of an action (decision or policy) for individuals in a group and their associated social welfare as judged by a policy maker, whereas in the second, the input is pairwise comparisons between the welfares associated with a given pair of utility vectors. We show that power mean functions are learnable with polynomial sample complexity in both cases, even if the comparisons are social welfare information is noisy. Finally, we design practical algorithms for these tasks and evaluate their performance.

complexity, log 2, vector, (14 more...)

2405.177

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.64)

Industry:

Government (0.68)
Health & Medicine (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningMar-7-2024

Provable Multi-Party Reinforcement Learning with Diverse Human Feedback

Zhong, Huiying, Deng, Zhun, Su, Weijie J., Wu, Zhiwei Steven, Zhang, Linjun

Reinforcement learning with human feedback (RLHF) is an emerging paradigm to align models with human preferences. Typically, RLHF aggregates preferences from multiple individuals who have diverse viewpoints that may conflict with each other. Our work \textit{initiates} the theoretical study of multi-party RLHF that explicitly models the diverse preferences of multiple individuals. We show how traditional RLHF approaches can fail since learning a single reward function cannot capture and balance the preferences of multiple individuals. To overcome such limitations, we incorporate meta-learning to learn multiple preferences and adopt different social welfare functions to aggregate the preferences across multiple parties. We focus on the offline learning setting and establish sample complexity bounds, along with efficiency and fairness guarantees, for optimizing diverse social welfare functions such as Nash, Utilitarian, and Leximin welfare functions. Our results show a separation between the sample complexities of multi-party RLHF and traditional single-party RLHF. Furthermore, we consider a reward-free setting, where each individual's preference is no longer consistent with a reward model, and give pessimistic variants of the von Neumann Winner based on offline preference data. Taken together, our work showcases the advantage of multi-party RLHF but also highlights its more demanding statistical complexity.

probability, social welfare function, welfare function, (15 more...)

arXiv.org Machine Learning

2403.05006

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)