Goto

Collaborating Authors

 faithful



'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games

The Guardian

'Minecraft was my way in' The Traitors 2026 finalist Jade. 'Minecraft was my way in' The Traitors 2026 finalist Jade. 'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games T he latest series of The Traitors, which ended last week on a nail-biting finale, featured some of the usual characters - from guileless extroverts to wannabe Columbos endlessly observing fellow contestants for the slightest flicker of treachery. But one faithful stood out for her quiet determination, despite a ceaseless onslaught of suspicion and accusation. That person was Jade Scott, and I wasn't at all surprised when, quite early on in the series, she revealed she was a keen gamer.


VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives

Neural Information Processing Systems

Many past works aim to improve visual reasoning in models by supervising feature importance (estimated by model explanation techniques) with human annotations such as highlights of important image regions. However, recent work has shown that performance gains from feature importance (FI) supervision for Visual Question Answering (VQA) tasks persist even with random supervision, suggesting that these methods do not meaningfully align model FI with human FI. In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility). Our best performing method, Visual Feature Importance Supervision (VISFIS), outperforms strong baselines on benchmark VQA datasets in terms of both in-distribution and out-of-distribution accuracy. While past work suggests that the mechanism for improved accuracy is through improved explanation plausibility, we show that this relationship depends crucially on explanation faithfulness (whether explanations truly represent the model's internal reasoning). Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful. Lastly, we show that, surprisingly, RRR metrics are not predictive of out-of-distribution model accuracy when controlling for a model's in-distribution accuracy, which calls into question the value of these metrics for evaluating model reasoning.


ConRad: Image Constrained Radiance Fields for 3D Generation from a Single Image

Neural Information Processing Systems

We present a novel method for reconstructing 3D objects from a single RGB image. Our method leverages the latest image generation models to infer the hidden 3D structure while remaining faithful to the input image. While existing methods obtain impressive results in generating 3D models from text prompts, they do not provide an easy approach for conditioning on input RGB data. Naive extensions of these methods often lead to improper alignment in appearance between the input image and the 3D reconstructions. We address these challenges by introducing Image Constrained Radiance Fields (ConRad), a novel variant of neural radiance fields. ConRad is an efficient 3D representation that explicitly captures the appearance of an input image in one viewpoint. We propose a training algorithm that leverages the single RGB image in conjunction with pretrained Diffusion Models to optimize the parameters of a ConRad representation. Extensive experiments show that ConRad representations can simplify preservation of image details while producing a realistic 3D reconstruction. Compared to existing state-of-the-art baselines, we show that our 3D reconstructions remain more faithful to the input and produce more consistent 3D models while demonstrating significantly improved quantitative performance on a ShapeNet object benchmark.


Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Hua, Zhenglin, He, Jinghan, Yao, Zijun, Han, Tianxu, Guo, Haiyun, Jia, Yuheng, Fang, Junfeng

arXiv.org Artificial Intelligence

Large vision-language models (LVLMs) have achieved remarkable performance on multimodal tasks. However, they still suffer from hallucinations, generating text inconsistent with visual input, posing significant risks in real-world applications. Existing approaches to address this issue focus on incorporating external knowledge bases, alignment training, or decoding strategies, all of which require substantial computational cost and time. Recent works try to explore more efficient alternatives by adjusting LVLMs' internal representations. Although promising, these methods may cause hallucinations to be insufficiently suppressed or lead to excessive interventions that negatively affect normal semantics. In this work, we leverage sparse autoencoders (SAEs) to identify semantic directions closely associated with faithfulness or hallucination, extracting more precise and disentangled hallucination-related representations. Our analysis demonstrates that interventions along the identified faithful direction can mitigate hallucinations, while those along the hallucinatory direction can exacerbate them. Building on these insights, we propose Steering LVLMs via SAE Latent Directions (SSL), a plug-and-play method based on SAE-derived latent directions to mitigate hallucinations in LVLMs. Extensive experiments demonstrate that SSL significantly outperforms existing decoding approaches in mitigating hallucinations, while maintaining transferability across different model architectures with negligible additional time overhead. The code is available at https://github.com/huazhenglin2003/SSL.


A PC Algorithm for Max-Linear Bayesian Networks

Améndola, Carlos, Hollering, Benjamin, Nowell, Francesco

arXiv.org Machine Learning

Max-linear Bayesian networks (MLBNs) are a relatively recent class of structural equation models which arise when the random variables involved have heavy-tailed distributions. Unlike most directed graphical models, MLBNs are typically not faithful to d-separation and thus classical causal discovery algorithms such as the PC algorithm or greedy equivalence search can not be used to accurately recover the true graph structure. In this paper, we begin the study of constraint-based discovery algorithms for MLBNs given an oracle for testing conditional independence in the true, unknown graph. We show that if the oracle is given by the $\ast$-separation criteria in the true graph, then the PC algorithm remains consistent despite the presence of additional CI statements implied by $\ast$-separation. We also introduce a new causal discovery algorithm named "PCstar" which assumes faithfulness to $C^\ast$-separation and is able to orient additional edges which cannot be oriented with only d- or $\ast$-separation.



Localized Cultural Knowledge is Conserved and Controllable in Large Language Models

Veselovsky, Veniamin, Argin, Berke, Stroebl, Benedikt, Wendler, Chris, West, Robert, Evans, James, Griffiths, Thomas L., Narayanan, Arvind

arXiv.org Artificial Intelligence

Just as humans display language patterns influenced by their native tongue when speaking new languages, LLMs often default to English-centric responses even when generating in other languages. Nevertheless, we observe that local cultural information persists within the models and can be readily activated for cultural customization. We first demonstrate that explicitly providing cultural context in prompts significantly improves the models' ability to generate culturally localized responses. We term the disparity in model performance with versus without explicit cultural context the explicit-implicit localization gap, indicating that while cultural knowledge exists within LLMs, it may not naturally surface in multilingual interactions if cultural context is not explicitly provided. Despite the explicit prompting benefit, however, the answers reduce in diversity and tend toward stereotypes. Second, we identify an explicit cultural customization vector, conserved across all non-English languages we explore, which enables LLMs to be steered from the synthetic English cultural world-model toward each non-English cultural world. Steered responses retain the diversity of implicit prompting and reduce stereotypes to dramatically improve the potential for customization. We discuss the implications of explicit cultural customization for understanding the conservation of alternative cultural world models within LLMs, and their controllable utility for translation, cultural customization, and the possibility of making the explicit implicit through soft control for expanded LLM function and appeal.


The rise of end times fascism

The Guardian

The movement for corporate city states cannot believe its good luck. For years, it has been pushing the extreme notion that wealthy, tax-averse people should up and start their own high-tech fiefdoms, whether new countries on artificial islands in international waters ("seasteading") or pro-business "freedom cities" such as Próspera, a glorified gated community combined with a wild west med spa on a Honduran island. Yet despite backing from the heavy-hitter venture capitalists Peter Thiel and Marc Andreessen, their extreme libertarian dreams kept bogging down: it turns out most self-respecting rich people don't actually want to live on floating oil rigs, even if it means lower taxes, and while Próspera might be nice for a holiday and some body "upgrades", its extra-national status is currently being challenged in court. Now, all of a sudden, this once-fringe network of corporate secessionists finds itself knocking on open doors at the dead center of global power. The first sign that fortunes were shifting came in 2023, when a campaigning Donald Trump, seemingly out of nowhere, promised to hold a contest that would lead to the creation of 10 "freedom cities" on federal lands. The trial balloon barely registered at the time, lost in the daily deluge of outrageous claims. Since the new administration took office, however, would-be country starters have been on a lobbying blitz, determined to turn Trump's pledge into reality. "The energy in DC is absolutely electric," Trey Goff, the chief of staff of Próspera, recently enthused after a trip to Capitol Hill.


Block-busted: why homemade Minecraft movies are the real hits

The Guardian

By any estimation, Minecraft is impossibly successful. The bestselling video game ever, as of last December it had 204 million monthly active players. Since it was first released in 2011, it has generated over 3bn ( 2.3bn) in revenue. What's more, its players have always been eager to demonstrate their fandom outside the boundaries of the game itself. In 2021, YouTube calculated that videos related to the game – tutorials, walk-throughs, homages, parodies – had collectively been viewed 1tn times. In short, it is a phenomenon.