Goto

Collaborating Authors

 Lackner, Martin


Humanity's Last Exam

arXiv.org Artificial Intelligence

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 3,000 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.


Preferences Single-Peaked on a Circle

Journal of Artificial Intelligence Research

We introduce the domain of preferences that are single-peaked on a circle, which is a generalization of the well-studied single-peaked domain. This preference restriction is useful, e.g., for scheduling decisions, certain facility location problems, and for one-dimensional decisions in the presence of extremist preferences. We give a fast recognition algorithm of this domain, provide a characterisation by finitely many forbidden subprofiles, and show that many popular single- and multi-winner voting rules are polynomial-time computable on this domain. In particular, we prove that Proportional Approval Voting can be computed in polynomial time for profiles that are single-peaked on a circle. In contrast, Kemeny's rule remains hard to evaluate, and several impossibility results from social choice theory can be proved using only profiles in this domain.


A Quantitative Analysis of Multi-Winner Rules

arXiv.org Artificial Intelligence

To choose a suitable multi-winner rule, i.e., a voting rule for selecting a subset of $k$ alternatives based on a collection of preferences, is a hard and ambiguous task. Depending on the context, it varies widely what constitutes the choice of an "optimal" subset. In this paper, we offer a new perspective to measure the quality of such subsets and---consequently---multi-winner rules. We provide a quantitative analysis using methods from the theory of approximation algorithms and estimate how well multi-winner rules approximate two extreme objectives: diversity as captured by the (Approval) Chamberlin--Courant rule and individual excellence as captured by Multi-winner Approval Voting. With both theoretical and experimental methods we classify multi-winner rules in terms of their quantitative alignment with these two opposing objectives.


Effective Heuristics for Committee Scoring Rules

AAAI Conferences

Committee scoring rules form an important class of multiwinner voting rules. As computing winning committees under such rules is generally intractable, in this paper we investigate efficient heuristics for this task. We design two novel heuristics for computing approximate results of multiwinner elections under arbitrary committee scoring rules; notably, one of these heuristics uses concepts from cooperative game theory. We then provide an experimental evaluation of our heuristics (and two others, known from the literature): we compare the scores of the committees output by our algorithms to the scores of the optimal committees, and also use the two-dimensional Euclidean domain to compare the visual representations of the outputs of our algorithms.


On the Complexity of Extended and Proportional Justified Representation

AAAI Conferences

We consider the problem of selecting a fixed-size committee based on approval ballots. It is desirable to have a committee in which all voters are fairly represented. Aziz et al. (2015a; 2017) proposed an axiom called extended justified representation (EJR), which aims to capture this intuition; subsequently, Sanchez-Fernandez et al. (2017) proposed a weaker variant of this axiom called proportional justified representation (PJR). It was shown that it is coNP-complete to check whether a given committee provides EJR, and it was conjectured that it is hard to find a committee that provides EJR. In contrast, there are polynomial-time computable voting rules that output committees providing PJR, but the complexity of checking whether a given committee provides PJR was an open problem. In this paper, we answer open questions from prior work by showing that EJR and PJR have the same worst-case complexity: we provide two polynomial-time algorithms that output committees providing EJR, yet we show that it is coNP-complete to decide whether a given committee provides PJR. We complement the latter result by fixed-parameter tractability results.


Multiwinner Elections With Diversity Constraints

AAAI Conferences

We develop a model of multiwinner elections that combines performance-based measures of the quality of the committee (such as, e.g., Borda scores of the committee members) with diversity constraints. Specifically, we assume that the candidates have certain attributes (such as being a male or a female, being junior or senior, etc.) and the goal is to elect a committee that, on the one hand, has as high a score regarding a given performance measure, but that, on the other hand, meets certain requirements (e.g., of the form "at least 30% of the committee members are junior candidates and at least 40% are females"). We analyze the computational complexity of computing winning committees in this model, obtaining polynomial-time algorithms (exact and approximate) and NP-hardness results. We focus on several natural classes of voting rules and diversity constraints.


Preferences Single-Peaked on a Circle

AAAI Conferences

We introduce the domain of preferences that are single-peaked on a circle, which is a generalization of the well-studied single-peaked domain. This preference restriction is useful, e.g., for scheduling decisions, and for one-dimensional decisions in the presence of extremist preferences. We give a fast recognition algorithm of this domain, provide a characterisation by finitely many forbidden subprofiles, and show that many popular single- and multi-winner voting rules are polynomial-time computable on this domain. In contrast, Kemeny's rule remains hard to evaluate, and several impossibility results from social choice theory can be proved using only profiles that are single-peaked on a circle


Phragmén’s Voting Methods and Justified Representation

AAAI Conferences

In the late 19th century, Lars Edvard Phragmén proposed a load-balancing approach for selecting committees based on approval ballots. We consider three committee voting rules resulting from this approach: two optimization variants one minimizing the maximal load and one minimizing the variance of loads —and a sequential variant. We study Phragmén's methods from an axiomatic point of view, focussing on justified representation and related properties that have recently been introduced by Aziz et al. (2015a) and Sánchez-Fernández et al. (2017). We show that the sequential variant satisfies proportional justified representation, making it the first known polynomial-time computable method with this property. Moreover, we show that the optimization variants satisfy perfect representation. We also analyze the com- putational complexity of Phragmén's methods and provide mixed-integer programming based algorithms for computing them.


Proportional Justified Representation

AAAI Conferences

The goal of multi-winner elections is to choose a fixed-size committee based on voters’ preferences. An important concern in this setting is representation: large groups of voters with cohesive preferences should be adequately represented by the election winners. Recently, Aziz et al. proposed two axioms that aim to capture this idea: justified representation (JR) and its strengthening extended justified representation (EJR). In this paper, we extend the work of Aziz et al. in several directions. First, we answer an open question of Aziz et al., by showing that Reweighted Approval Voting satisfies JR for k = 3; 4; 5, but fails it for k >= 6. Second, we observe that EJR is incompatible with the Perfect Representation criterion, which is important for many applications of multi-winner voting, and propose a relaxation of EJR, which we call Proportional Justified Representation (PJR). PJR is more demanding than JR, but, unlike EJR, it is compatible with perfect representation, and a committee that provides PJR can be computed in polynomial time if the committee size divides the number of voters. Moreover, just like EJR, PJR can be used to characterize the classic PAV rule in the class of weighted PAV rules. On the other hand, we show that EJR provides stronger guarantees with respect to average voter satisfaction than PJR does.


Winner Determination in Huge Elections with MapReduce

AAAI Conferences

In computational social choice, we are concerned with the development of methods for joint decision making. A central problem in this field is the winner determination problem, which aims at identifying the most preferred alternative(s). With the rise of modern e-business platforms, processing of huge amounts of preference data has become an issue. In this work, we apply the MapReduce framework - which has been specifically designed for dealing with big data - to various versions of the winner determination problem. We obtain efficient and highly parallel algorithms and provide a theoretical analysis and experimental evaluation.