Goto

Collaborating Authors

 contestant


'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games

The Guardian

'Minecraft was my way in' The Traitors 2026 finalist Jade. 'Minecraft was my way in' The Traitors 2026 finalist Jade. 'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games T he latest series of The Traitors, which ended last week on a nail-biting finale, featured some of the usual characters - from guileless extroverts to wannabe Columbos endlessly observing fellow contestants for the slightest flicker of treachery. But one faithful stood out for her quiet determination, despite a ceaseless onslaught of suspicion and accusation. That person was Jade Scott, and I wasn't at all surprised when, quite early on in the series, she revealed she was a keen gamer.


Aggregating Quantitative Relative Judgments: From Social Choice to Ranking Prediction

Neural Information Processing Systems

Quantitative Relative Judgment Aggregation (QRJA) is a new research topic in (computational) social choice. In the QRJA model, agents provide judgments on the relative quality of different candidates, and the goal is to aggregate these judgments across all agents. In this work, our main conceptual contribution is to explore the interplay between QRJA in a social choice context and its application to ranking prediction. We observe that in QRJA, judges do not have to be people with subjective opinions; for example, a race can be viewed as a ``judgment'' on the contestants' relative abilities. This allows us to aggregate results from multiple races to evaluate the contestants' true qualities. At a technical level, we introduce new aggregation rules for QRJA and study their structural and computational properties. We evaluate the proposed methods on data from various real races and show that QRJA-based methods offer effective and interpretable ranking predictions.


The Most Dangerous Genre

The New Yorker

Our obsession with deadly game shows--from "The Running Man" and "Squid Game" to MrBeast's real-life reënactments--reflects a shift in the national mood to something increasingly zero-sum. It seems we can't get enough of game shows in which the losers die. "The Hunger Games" became a multibillion-dollar media franchise over the past decade, with audiences returning to the theatre, time and time again, to watch adolescents try to kill one another in an enormous arena--a contest devised by the leaders of a society rife with inequality. Netflix's " Squid Game " followed four hundred and fifty-six desperate individuals into an underworld where they play lethal versions of children's games in the hope of winning a life-changing amount of money. Four weeks after its release, the show had become Netflix's most-watched series ever; to date, the first season has been viewed more than two hundred and sixty-five million times.



Taskmaster Deconstructed: A Quantitative Look at Tension, Volatility, and Viewer Ratings

Silver, David H.

arXiv.org Artificial Intelligence

Taskmaster is a British television show that combines comedic performance with a formal scoring system. Despite the appearance of structured competition, it remains unclear whether scoring dynamics contribute meaningfully to audience engagement. We conducted a statistical analysis of 162 episodes across 18 series, using fifteen episode-level metrics to quantify rank volatility, point spread, lead changes, and winner dominance. None of these metrics showed a significant association with IMDb ratings, even after controlling for series effects. Long-term trends suggest that average points have increased over time, while volatility has slightly declined and rank spread has remained stable. These patterns indicate an attempt to enhance competitive visibility without altering the show's structural equilibrium. We also analyzed contestant rank trajectories and identified five recurring archetypes describing performance styles. These patterns suggest that viewer interest is shaped more by contestant behavior than by game mechanics.


AutoBench: Automating LLM Evaluation through Reciprocal Peer Assessment

Loi, Dario, Muià, Elena Maria, Siciliano, Federico, Trappolini, Giovanni, Crisà, Vincenzo, Kruger, Peter, Silvestri, Fabrizio

arXiv.org Artificial Intelligence

We present AutoBench, a fully automated and self-sustaining framework for evaluating Large Language Models (LLMs) through reciprocal peer assessment. This paper provides a rigorous scientific validation of the AutoBench methodology, originally developed as an open-source project by eZecute S.R.L.. Unlike static benchmarks that suffer from test-set contamination and limited adaptability, AutoBench dynamically generates novel evaluation tasks while models alternately serve as question generators, contestants, and judges across diverse domains. An iterative weighting mechanism amplifies the influence of consistently reliable evaluators, aggregating peer judgments into consensus-based rankings that reflect collective model agreement. Our experiments demonstrate strong correlations with established benchmarks including MMLU-Pro and GPQA (respectively 78\% and 63\%), validating this peer-driven evaluation paradigm. The multi-judge design significantly outperforms single-judge baselines, confirming that distributed evaluation produces more robust and human-consistent assessments. AutoBench offers a scalable, contamination-resistant alternative to static benchmarks for the continuous evaluation of evolving language models.


Think You're Smarter Than a What Next Producer? Find Out With This Week's News Quiz.

Slate

Get the Slate Quiz in your inbox every weekday. You can manage your newsletter subscriptions at any time. You're already subscribed to the Slate Games newsletter. You can manage your newsletter subscriptions at any time. We encountered an issue signing you up.


The Indian woman who stood up to moral policing - and won a pageant

BBC News

Muskan Sharma stood up to men who tried to bully her over her clothes - and went on to win hearts and a beauty pageant. The 23-year-old, who was crowned Miss Rishikesh 2025 last week in the northern Indian state of Uttarakhand, told the BBC that even though it was a small local pageant, it made me feel like Miss Universe. Sharma's win has made headlines in India as it came after a viral video that showed her spiritedly arguing with a man who barged into their rehearsals just a day before the 4 October contest. Sharma, who wanted to be a model and participate in a pageant since I was in school, said the intruders came in just as they broke for lunch. We were sitting around, chilling, having a laugh when they walked in, she said.



Putnam-like dataset summary: LLMs as mathematical competition contestants

Bieganowski, Bartosz, Strzelecki, Daniel, Skiba, Robert, Topolewski, Mateusz

arXiv.org Artificial Intelligence

In this paper we summarize the results of the Putnam-like benchmark published by Google DeepMind. This dataset consists of 96 original problems in the spirit of the Putnam Competition and 576 solutions of LLMs. We analyse the performance of models on this set of problems to verify their ability to solve problems from mathematical contests.