Goto

Collaborating Authors

 vote


SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks

Neural Information Processing Systems

Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses. The platform currently supports 47 foundation models and has collected over 20,000 votes from human researchers across diverse scientific domains. Our analysis of the data collected so far confirms its high quality. We discuss the results and insights based on the model ranking leaderboard. To further promote research in building modelbased automated evaluation systems for literature tasks, we release SciArena-Eval, a meta-evaluation benchmark based on collected preference data. It measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes. Our experiments highlight the benchmark's challenges and emphasize the need for more reliable automated evaluation methods.


Clustering via Hedonic Games: New Concepts and Algorithms

Neural Information Processing Systems

We study fundamental connections between coalition formation games and clustering, illustrating the cross-disciplinary relevance of these concepts. We focus on graphical hedonic games where agents' preferences are compactly represented by a friendship graph and an enmity graph. In the context of clustering, friendship relations naturally align with data point similarities, whereas enmity corresponds to dissimilarities. We consider two stability notions based on single-agent deviations: local popularity and local stability.


SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks

Neural Information Processing Systems

Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons.By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses.The platform currently supports 44 open-source and proprietary foundation models and has collected over 19,000 votes from human researchers across diverse scientific domains. Our analysis of the data collected so far confirms its high quality.We discuss the results and insights based on the model ranking leaderboard.To further promote research in building model-based automated evaluation systems for literature tasks, we release SciArena-Eval, a meta-evaluation benchmark based on our collected preference data. The benchmark measures the accuracy of models in judging answer quality by comparing their pairwise assessments with human votes. Our experiments highlight the benchmark's challenges and emphasize the need for more reliable automated evaluation methods.


Trump Risks Key Surveillance Authority Over 'Unqualified' Spy-Chief Pick

WIRED

Trump Risks Key Surveillance Authority Over'Unqualified' Spy-Chief Pick US lawmakers are alarmed that Bill Pulte, a housing official with no intelligence experience, is poised to take charge of one of the government's most powerful surveillance tools. A sweeping warrantless surveillance authority remains on track to expire Friday, with no clear path to a deal, after President Donald Trump refused this week to abandon his pick of housing official Bill Pulte to temporarily lead the US intelligence community--even tasking Pulte with gutting the Office of the Director of National Intelligence in a DOGE-style "downsizing" before a permanent director is named. In a Truth Social post after his second White House meeting in two days with House speaker Mike Johnson, Trump called Section 702 of the Foreign Intelligence Surveillance Act "very important to our military, and keeping the American people safe" and asked Congress for a short-term extension to give him time to find a permanent director of national intelligence. Section 702 lets the government collect the communications of foreign targets abroad without a warrant, sweeping in an unknown volume of Americans' messages that the FBI can later search. It faces a first-ever lapse in its legal authorization if Congress does not act by the end of Friday, June 12.


India's communists once ruled millions. What happened to them?

BBC News

India's communists once ruled millions. For the first time since 1957, India no longer has a single communist-led state government. The defeat of the Communist Party of India (Marxist)-led Left Democratic Front (LDF) in Kerala this month, after a decade in power, marked the end - at least for now - of one of the world's most enduring experiments in democratic communism. At their peak, India's communist parties ruled states stretching from West Bengal to Kerala and Tripura. They impacted the lives of more than 100 million people through trade unions, peasant organisations, student wings and disciplined cadre networks.




GenAI Arena: An Open Evaluation Platform for Generative Models

Neural Information Processing Systems

Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the nuanced quality and user satisfaction associated with generative outputs. This paper proposes an open platform GenAI-Arena to evaluate different image and video generative models, where users can actively participate in evaluating these models.


Welcome to the dark side of crypto's permissionless dream

MIT Technology Review

Jean-Paul Thorbjornsen is a leader of THORChain, a blockchain that is not supposed to have any leaders--and is reeling from a series of expensive controversies. We can do whatever we want," Jean-Paul Thorbjornsen tells me from the pilot's seat of his Aston Martin helicopter. As we fly over suburbs outside Melbourne, Australia, it's becoming clear that doing whatever he wants is Thorbjornsen's MO. Upper-middle-class homes give way to vineyards, and Thorbjornsen points out our landing spot outside a winery. "They're going to ask for a shot now," he says, used to the attention drawn by his luxury helicopter, emblazoned with the tail letters "BTC" for bitcoin (the price tag of $5 million in Australian dollars--$3.5 million in US dollars today--was perhaps reasonable for someone who claims a previous crypto project made more than AU$400 million, although he also says those funds were tied up in the company). Thorbjornsen is a founder of THORChain, a blockchain through which users can swap ...