Goto

Collaborating Authors

 Government


A Unified Representation Underlying the Judgment of Large Language Models

arXiv.org Artificial Intelligence

A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.


Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices

arXiv.org Artificial Intelligence

While new benchmarks for large language models (LLMs) are being developed continuously to catch up with the growing capabilities of new models and AI in general, using and evaluating LLMs in non-English languages remains a little-charted landscape. We give a concise overview of recent developments in LLM benchmarking, and then propose a new taxonomy for the categorization of benchmarks that is tailored to multilingual or non-English use scenarios. We further propose a set of best practices and quality standards that could lead to a more coordinated development of benchmarks for European languages. Among other recommendations, we advocate for a higher language and culture sensitivity of evaluation methods.


When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity

arXiv.org Artificial Intelligence

Language models are often evaluated with scalar metrics like accuracy, but such measures fail to capture how models internally represent ambiguity, especially when human annotators disagree. We propose a topological perspective to analyze how fine-tuned models encode ambiguity and more generally instances. Applied to RoBERTa-Large on the MD-Offense dataset, Mapper, a tool from topological data analysis, reveals that fine-tuning restructures embedding space into modular, non-convex regions aligned with model predictions, even for highly ambiguous cases. Over $98\%$ of connected components exhibit $\geq 90\%$ prediction purity, yet alignment with ground-truth labels drops in ambiguous data, surfacing a hidden tension between structural confidence and label uncertainty. Unlike traditional tools such as PCA or UMAP, Mapper captures this geometry directly uncovering decision regions, boundary collapses, and overconfident clusters. Our findings position Mapper as a powerful diagnostic tool for understanding how models resolve ambiguity. Beyond visualization, it also enables topological metrics that may inform proactive modeling strategies in subjective NLP tasks.


Reflections from Research Roundtables at the Conference on Health, Inference, and Learning (CHIL) 2025

arXiv.org Artificial Intelligence

The 6th annual Conference on Health, Inference, and Learning (CHIL 2025), hosted by the Association for Health Learning and Inference (AHLI), was held in person on June 25-27, 2025, at the University of California, Berkeley, in Berkeley, California, USA. As part of this year's program, we hosted Research Roundtables to catalyze collaborative, small-group dialogue around critical, timely topics at the intersection of machine learning and healthcare. Each roundtable was moderated by a team of senior and junior chairs who fostered open exchange, intellectual curiosity, and inclusive engagement. The sessions emphasized rigorous discussion of key challenges, creative exploration of emerging opportunities, and collective ideation toward actionable directions in the field. Overall, the Research Roundtables brought together a diverse mix of participants, including academic researchers, clinicians, industry professionals, and policy experts. In total, eight roundtables were held across two 30-minute sessions, with a brief transition break to allow participants to join multiple discussions.


Hey, wait a minute: on at-issue sensitivity in Language Models

arXiv.org Artificial Intelligence

Evaluating the naturalness of dialogue in language models (LMs) is not trivial: notions of 'naturalness' vary, and scalable quantitative metrics remain limited. This study leverages the linguistic notion of 'at-issueness' to assess dialogue naturalness and introduces a new method: Divide, Generate, Recombine, and Compare (DGRC). DGRC (i) divides a dialogue as a prompt, (ii) generates continuations for subparts using LMs, (iii) recombines the dialogue and continuations, and (iv) compares the likelihoods of the recombined sequences. This approach mitigates bias in linguistic analyses of LMs and enables systematic testing of discourse-sensitive behavior. Applying DGRC, we find that LMs prefer to continue dialogue on at-issue content, with this effect enhanced in instruct-tuned models. They also reduce their at-issue preference when relevant cues (e.g., "Hey, wait a minute") are present. Although instruct-tuning does not further amplify this modulation, the pattern reflects a hallmark of successful dialogue dynamics.


Enhancing Phenotype Discovery in Electronic Health Records through Prior Knowledge-Guided Unsupervised Learning

arXiv.org Machine Learning

Objectives: Unsupervised learning with electronic health record (EHR) data has shown promise for phenotype discovery, but approaches typically disregard existing clinical information, limiting interpretability. We operationalize a Bayesian latent class framework for phenotyping that incorporates domain-specific knowledge to improve clinical meaningfulness of EHR-derived phenotypes and illustrate its utility by identifying an asthma sub-phenotype informed by features of Type 2 (T2) inflammation. Materials and methods: We illustrate a framework for incorporating clinical knowledge into a Bayesian latent class model via informative priors to guide unsupervised clustering toward clinically relevant subgroups. This approach models missingness, accounting for potential missing-not-at-random patterns, and provides patient-level probabilities for phenotype assignment with uncertainty. Using reusable and flexible code, we applied the model to a large asthma EHR cohort, specifying informative priors for T2 inflammation-related features and weakly informative priors for other clinical variables, allowing the data to inform posterior distributions. Results and Conclusion: Using encounter data from January 2017 to February 2024 for 44,642 adult asthma patients, we found a bimodal posterior distribution of phenotype assignment, indicating clear class separation. The T2 inflammation-informed class (38.7%) was characterized by elevated eosinophil levels and allergy markers, plus high healthcare utilization and medication use, despite weakly informative priors on the latter variables. These patterns suggest an "uncontrolled T2-high" sub-phenotype. This demonstrates how our Bayesian latent class modeling approach supports hypothesis generation and cohort identification in EHR-based studies of heterogeneous diseases without well-established phenotype definitions.


Mortgages and AI to be added to the curriculum in English schools

BBC News

Children will be taught how to budget and how mortgages work as the government seeks to modernise the national curriculum in England's schools. They will also be taught how to spot fake news and disinformation, including AI-generated content, following the first review of what is taught in schools in over a decade. Education Secretary Bridget Phillipson said the government wanted to revitalise the curriculum but keep a firm foundation in basics like English, maths and reading. Head teachers said the review's recommendations were sensible but would require sufficient funding and teachers. The government commissioned a review of the national curriculum and assessments in England last year, in the hope of developing a cutting edge curriculum that would narrow attainment gaps between the most disadvantaged students and their classmates.


US spy jet spotted patrolling cartel stronghold off Mexico's coast for hours

Daily Mail - Science & tech

AMANDA PLATELL: Fergie's delusions have reached a new low. I can't believe Beatrice and Eugenie are egging her on. Wake up and see he's the master of the dark arts: MEGYN KELLY blows the lid on the REAL Mamdani... how are they missing this? 'Screaming' Sydney Sweeney'hates' that she was caught hiding in ex-fiancรฉ's car: Now insiders spill truth about backseat rendezvous and lingering'frustrations' Karoline Leavitt warns'short of planes falling out of the sky' Democrats won't reopen government Donald Trump's adopted hometown is set to name its airport after the president Now he's dead, here's the full story of what happened that day... and the ghastly aftermath no one knows about Bella Thorne continues swimsuit season as she works sexy bikini for Los Cabo trip with her'love' Mark Emms Why Tuesday's races aren't as close as you think: White House analyst CRAIG KESHISHIAN reveals what the polls always miss She took my son... now I'm exposing the secrets she's hidden from the world Big Short star Michael Burry's $1BILLION bet against tech giants shakes markets: 'We've seen this movie before' The murder that haunts the Kennedys: Martha Moxley's loved ones reveal their truth in the FREE The Crime Desk newsletter... as accused cousin cleared in killing breaks cover Boy George's vile attack on lesbian banned by LA gym for confronting'transgender patron who stared at her while she was naked in locker room' Diddy's male prison protector unmasked: How disgraced mogul has repaid him... and turned to God for repentance US spy jet spotted patrolling cartel stronghold off Mexico's coast for hours The US Navy's P-8 Poseidon, an aircraft designed for anti-submarine warfare, surveillance, and reconnaissance, was spotted circling a drug-smuggling hub off Mexico's coast. On Tuesday, flight trackers recorded the jet performing multiple loops miles offshore from Tijuana, a city long plagued by violent organized crime and considered a major corridor for cartel operations. The P-8 took off from the Naval Air Station Whidbey Island in Washington, flew through Oregon and through California .


Black hole unleashes brightest flare ever--brighter than 10 trillion suns

Popular Science

A supermassive black hole's star snack lit up the cosmos like never before. This artist's concept depicts a supermassive black hole in the process of shredding a massive star--at least 30 times the mass of our Sun--to pieces. Scientists propose this is what happened around the distant black hole referred to as J2245+3743, which in 2018, brightened dramatically to create the brightest black hole flare ever recorded, shining with the light of 10 trillion suns. Breakthroughs, discoveries, and DIY tips sent every weekday. It's hard to wrap your head around the fundamentals of a black hole .


Zelensky visits troops near embattled front line town of Pokrovsk

BBC News

Ukrainian President Volodymyr Zelensky says he has visited troops near the town of Pokrovsk, where the fiercest front line battle between Russia and Ukraine is currently taking place. Zelensky posted photos showing him meeting personnel at a command post in the Dobropillya sector, some 20km (12 miles) north of Pokrovsk in the Donetsk region. Kyiv's top military commander, Oleksandr Syrskiy, said on Monday that Ukraine was increasing pressure on the Dobropillya front to force the enemy to disperse its forces and make it impossible to concentrate their main efforts in the Pokrovsk area. Russia has been trying to seize Pokrovsk - a strategic frontline town and logistic hub - for over a year. Although it has taken them months to approach the town's borders, Russian soldiers have now infiltrated it and on Friday, Zelensky said Russia had amassed 170,000 troops on its outskirts.