Goto

Collaborating Authors

 rogers


The US Military Wants to Fix Its Own Equipment. Defense Contractors Are Trying to Shoot That Down

WIRED

A push by military contractors could alter pending legislation that would have empowered servicemembers to repair equipment. Lobbyists are pitching a subscription service instead. Right to repair provisions in the National Defense Authorization Act, which would secure funding for the US military in 2026, are likely to be struck from the final language of the bill despite enjoying broad bipartisan support, sources familiar with ongoing negotiations tell WIRED. They say that provisions in the act enabling servicemembers to repair their own equipment are likely to be removed entirely, and replaced with a data-as-a-service subscription plan that benefits defense contractors. The right to repair has become a thorny issue in the military.


Young Mormons Built an App to Help Men Quit Gooning

WIRED

The Relay app allows users to track their porn-free streaks and get group support. Its creators say they're taking a stand against porn and AI erotica. Jamie would meticulously schedule his days around finding time alone to watch porn and masturbate--often up to five times a day. The 32-year-old Michigan engineer, who did not want to use his real name due to privacy concerns, first watched porn at the impressionable age of 12, but never realized he had a problem until just after his father's funeral three years ago. "I didn't shed a single tear," he says.


Uncovering the Computational Ingredients of Human-Like Representations in LLMs

Studdiford, Zach, Rogers, Timothy T., Mukherjee, Kushin, Suresh, Siddharth

arXiv.org Artificial Intelligence

The ability to translate diverse patterns of inputs into structured patterns of behavior has been thought to rest on both humans' and machines' ability to learn robust representations of relevant concepts. The rapid advancement of transformer-based large language models (LLMs) has led to a diversity of computational ingredients -- architectures, fine tuning methods, and training datasets among others -- but it remains unclear which of these ingredients are most crucial for building models that develop human-like representations. Further, most current LLM benchmarks are not suited to measuring representational alignment between humans and models, making benchmark scores unreliable for assessing if current LLMs are making progress towards becoming useful cognitive models. We address these limitations by first evaluating a set of over 70 models that widely vary in their computational ingredients on a triplet similarity task, a method well established in the cognitive sciences for measuring human conceptual representations, using concepts from the THINGS database. Comparing human and model representations, we find that models that undergo instruction-finetuning and which have larger dimensionality of attention heads are among the most human aligned, while multimodal pretraining and parameter size have limited bearing on alignment. Correlations between alignment scores and scores on existing benchmarks reveal that while some benchmarks (e.g., MMLU) are better suited than others (e.g., MUSR) for capturing representational alignment, no existing benchmark is capable of fully accounting for the variance of alignment scores, demonstrating their insufficiency in capturing human-AI alignment. Taken together, our findings help highlight the computational ingredients most essential for advancing LLMs towards models of human conceptual representation and address a key benchmarking gap in LLM evaluation.


GPT-5's modest gains suggest AI progress is slowing down

New Scientist

GPT-5 is the latest version of OpenAI's large language model OpenAI has released its newest AI model, GPT-5, two years after rolling out GPT-4, whose success has driven ChatGPT towards world domination. But despite promises of a similar jump in capability, GPT-5 appears to show little improvement over other leading AI models, hinting that the industry may need a fresh approach to build more intelligent AI systems. OpenAI's own pronouncements hail GPT-5 as a "significant leap in intelligence" from the company's previous models, showing apparent improvements in programming, mathematics, writing, health information and visual understanding. It also promises less frequent hallucinations, which is when an AI presents false information as true. On an internal benchmark measuring "performance on complex, economically valuable knowledge work", OpenAI says GPT‑5 is "comparable to or better than experts in roughly half the cases… across tasks spanning over 40 occupations including law, logistics, sales, and engineering."


Clio-X: AWeb3 Solution for Privacy-Preserving AI Access to Digital Archives

Lemieux, Victoria L., Gil, Rosa, Molosiwa, Faith, Zhou, Qihong, Li, Binming, Garcia, Roberto, Cubillo, Luis De La Torre, Wang, Zehua

arXiv.org Artificial Intelligence

As archives turn to artificial intelligence to manage growing volumes of digital records, privacy risks inherent in current AI data practices raise critical concerns about data sovereignty and ethical accountability. This paper explores how privacy-enhancing technologies (PETs) and Web3 architectures can support archives to preserve control over sensitive content while still being able to make it available for access by researchers. We present Clio-X, a decentralized, privacy-first Web3 digital solution designed to embed PETs into archival workflows and support AI-enabled reference and access. Drawing on a user evaluation of a medium-fidelity prototype, the study reveals both interest in the potential of the solution and significant barriers to adoption related to trust, system opacity, economic concerns, and governance. Using Rogers' Diffusion of Innovation theory, we analyze the sociotechnical dimensions of these barriers and propose a path forward centered on participatory design and decentralized governance through a Clio-X Decentralized Autonomous Organization. By integrating technical safeguards with community-based oversight, Clio-X offers a novel model to ethically deploy AI in cultural heritage contexts.


Evaluating Steering Techniques using Human Similarity Judgments

Studdiford, Zach, Rogers, Timothy T., Suresh, Siddharth, Mukherjee, Kushin

arXiv.org Artificial Intelligence

Current evaluations of Large Language Model (LLM) steering techniques focus on task-specific performance, overlooking how well steered representations align with human cognition. Using a well-established triadic similarity judgment task, we assessed steered LLMs on their ability to flexibly judge similarity between concepts based on size or kind. We found that prompt-based steering methods outperformed other methods both in terms of steering accuracy and model-to-human alignment. We also found LLMs were biased towards 'kind' similarity and struggled with 'size' alignment. This evaluation approach, grounded in human cognition, adds further support to the efficacy of prompt-based steering and reveals privileged representational axes in LLMs prior to steering.


AI-enhanced semantic feature norms for 786 concepts

Suresh, Siddharth, Mukherjee, Kushin, Giallanza, Tyler, Yu, Xizheng, Patil, Mia, Cohen, Jonathan D., Rogers, Timothy T.

arXiv.org Artificial Intelligence

Semantic feature norms have been foundational in the study of human conceptual knowledge, yet traditional methods face trade-offs between concept/feature coverage and verifiability of quality due to the labor-intensive nature of norming studies. Here, we introduce a novel approach that augments a dataset of human-generated feature norms with responses from large language models (LLMs) while verifying the quality of norms against reliable human judgments. We find that our AI-enhanced feature norm dataset, NOVA: Norms Optimized Via AI, shows much higher feature density and overlap among concepts while outperforming a comparable human-only norm dataset and word-embedding models in predicting people's semantic similarity judgments. Taken together, we demonstrate that human conceptual knowledge is richer than captured in previous norm datasets and show that, with proper validation, LLMs can serve as powerful tools for cognitive science research.


'It was just the perfect game': Henk Rogers on buying Tetris and foiling the KGB

The Guardian

When game designer and entrepreneur Henk Rogers first encountered Tetris at the 1988 Las Vegas Consumer Electronics Show, he immediately knew it was special. "It was just the perfect game," he recalls. "It looked so simple, so rudimentary, but I wanted to play it again and again and again … There was no other game demo that ever did that to me." Rogers is now co-owner of the Tetris Company, which manages and licenses the Tetris brand. Over the past 30 years, he has become almost as famous as the game itself. The escapades surrounding his deal to buy its distribution rights from Russian agency Elektronorgtechnica (Elorg) were dramatised in an Apple TV film starring Taron Egerton.


When Should We Orchestrate Multiple Agents?

Bhatt, Umang, Kapoor, Sanyam, Upadhyay, Mihir, Sucholutsky, Ilia, Quinzan, Francesco, Collins, Katherine M., Weller, Adrian, Wilson, Andrew Gordon, Zafar, Muhammad Bilal

arXiv.org Artificial Intelligence

Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost differentials between agents. We then empirically demonstrate how orchestration between multiple agents can be helpful for selecting agents in a simulated environment, picking a learning strategy in the infamous Rogers' Paradox from social science, and outsourcing tasks to other agents during a question-answer task in a user study.


Judge denies Musk's initial bid to halt OpenAI's for-profit shift but sets trial for fall

The Guardian

A US judge on Tuesday denied Elon Musk's request for a preliminary injunction to pause OpenAI's transition to a for-profit model but agreed to hear a trial in the fall of this year, the latest turn in the high-stakes legal fight. The tech billionaire does not have "the high burden required for a preliminary injunction" to block the conversion of OpenAI, said Yvonne Gonzalez Rogers, a US district judge in Oakland, California. But Rogers wrote in the order that she wanted to resolve the lawsuit quickly given "the public interest at stake and potential for harm if a conversion contrary to law occurred". Musk and OpenAI, which he co-founded as a non-profit in 2015 but left before it took off, have been embroiled in a yearlong legal battle. The CEO of Tesla and X, formerly Twitter, accuses OpenAI of straying from its founding mission to develop artificial intelligence for the good of humanity, not corporate profit.