Goto

Collaborating Authors

 Interview


Meta's AI Workers Are Revolting, Peter Thiel's Secret Society, and SBF's Plea to Trump

WIRED

On today's, we dive into the dysfunction in Meta's newly formed AI unit and why it's been driving already-low employee morale even further into the ground. This week on, our hosts discuss the meltdown that has been recently unfolding at Meta and what it says about the company's relentless ambitions in the AI race. They also dive into the leaked messages and names of an invite-only group cofounded by billionaire tech founder Peter Thiel, and how Sam Bankman-Fried is now actively seeking a pardon from the Trump administration. Plus, they share their impressions on SpaceX acquiring Cursor and the latest on the negotiations between Anthropic and the government. 'Tell Him He's a Piece of Shit': Meta's New AI Unit Is a Total Mess Write to us at [email protected] . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Before we start, two quick things. If you've been enjoying listening to the show, we would appreciate it if you took a second to rate it in your podcast app of choice. It really helps us reach more people. And second, if you have any questions related to tech, privacy, or politics that you would like me, Zoë, and Leah to take on, now is the time to submit them to [email protected] . It doesn't matter how big or how small, we want to hear from you and get you answers. Today on the show, we're talking about the dysfunction in Meta's newly formed AI unit and why it's been driving employee morale, which was already very, very low, even further into the ground. We'll also break down the recent online leak that shed light on Peter Thiel's invite-only group, Dialog, more than 200 names of high profile people in government, tech, academia, beyond are listed in the documents as members and guests of this secretive society, not to mention a look at what they talk about behind closed doors.


BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks

Neural Information Processing Systems

Large language models (LLMs) are powerful tools capable of handling diverse tasks. Comparing and selecting appropriate LLMs for specific tasks requires systematic evaluation methods, as models exhibit varying capabilities across different domains. However, finding suitable benchmarks is difficult given the many available options. This complexity not only increases the risk of benchmark misuse and misinterpretation but also demands substantial effort from LLM users, seeking the most suitable benchmarks for their specific needs. To address these issues, we introduce BenchmarkCards, an intuitive and validated documentation framework that standardizes critical benchmark attributes such as objectives, methodologies, data sources, and limitations. Through user studies involving benchmark creators and users, we show that BenchmarkCardscan simplify benchmark selection and enhance transparency, facilitating informed decision-making in evaluating LLMs.


Appendix

Neural Information Processing Systems

The DeceptionBench is designed as a research benchmark to systematically study deception behaviors in LLMs, fostering a deeper understanding of their decision-making processes in real-world scenarios. Our primary intent is to provide a standardized, transparent tool for the research community to evaluate and improve LLMs' ethical alignment, not to enable or encourage deceptive practices. To prevent potential misuse by malicious actors, we commit to publicly releasing all evaluation data under an open license. This transparency ensures that DeceptionBench's methodology and outcomes are subject to scrutiny, replication, and improvement by the research community, reducing the risk of hidden exploitation. By prioritizing openness, we aim to advance responsible AI development while safeguarding against misuse in harmful contexts. The field of Large Language Models (LLMs) has undergone remarkable evolution in recent years, reshaping the landscape of natural language processing.


Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Neural Information Processing Systems

Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training and evaluation of AI agents, off-the-shelf LLMs often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue. We define three automatic metrics--prompt-to-line consistency, line-to-line consistency, and Q&A consistency--that capture different types of persona drift and validate each against human annotations. Using these metrics as reward signals, we apply multiturn reinforcement learning to fine-tune LLMs for three user roles: a patient, a student, and a social chat partner. Our method reduces inconsistency by over 55%, resulting in more coherent, faithful, and trustworthy simulated users.


AIhub monthly digest: June 2026 – biodiversity, resource allocation, and color metaphors

AIHub

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we found out how foundation models are being used for conservation efforts, how AI can help with scarce resource allocation, and how color metaphors and LLMs can teach us about human cognition. We also went to ICRA and captured some footage of cutting-edge robots. In this latest interview in our AAAI Fellow series, we found out about Tanya Berger-Wolf's research developing a foundation model for biology, the insights this model can provide for conservation and protecting ecosystems, interesting collaborations over the years, and what the future has in store. In this interview, we chat to Sanmay Das, who was elected as a Fellow "for development of multiagent interaction mechanisms and learning techniques in the public interest, and for leadership service to the profession".


MMPB: It's Time for Multi-Modal Personalization

Neural Information Processing Systems

Visual personalization is essential in user-facing AI systems such as smart homes and healthcare, where aligning model behavior with user-centric concepts is critical. However, recent large Vision-Language Models (VLMs), despite their broad applicability, remain underexplored in their ability to adapt to individual users. In this paper, we introduce MMPB, the first extensive benchmark for evaluating VLMs on personalization. MMPB comprises 10k image-query pairs and includes 111 personalizable concepts across four categories: humans, animals, objects, and characters, with the human category enriched with preference-grounded queries.


Rivian's CEO on Tesla's Cybertruck, Ferrari's Luce, and What Happens If the R2 Fails

WIRED

RJ Scaringe, the CEO of Rivian Automotive, joined us for a wide-ranging interview about how his company's new electric SUV fits into the current EV industry, and what comes next. RJ Scaringe got his PhD from MIT studying internal combustion engines. Then he founded a company to make them obsolete. In 2009, fresh out of grad school, he launched what would become Rivian. The company spent nearly a decade in stealth mode before arriving at the 2018 LA Auto Show with two electric rides nobody had seen coming. The road, however, hasn't been easy. Rivian lost $3.6 billion in 2025, and has burned through nearly $25 billion in the past eight years. It has spent more money over the same period than almost every other pure EV maker. Rivian's IPO was the largest worldwide in 2021, and one of the largest in US history, within days valuing the company at over $100 billion. Its stock has dropped from a high of $130 to around $16. Since the R1 went on sale in 2021, Rivian has sold 175,000 cars.


Interview with AAAI Fellow Tanya Berger-Wolf: AI for ecology, biodiversity, and conservation

AIHub

Each year the AAAI recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. Over the course of the next few months, we'll be talking to some of the 2026 AAAI Fellows. In this interview, we met with Tanya Berger-Wolf, who was elected as a Fellow . We found out about her latest research developing a foundation model for biology, the insights this model can provide, interesting collaborations over the years, and what the future has in store. Could you start with a quick introduction and tell us about the broad area that you're working in? My area of research is in AI for ecology, biodiversity, and conservation.


Statistical or embodied? Comparing people and LLMs in their processing of color metaphors: an interview with Douglas Guilbeault

AIHub

We sat down with Douglas Guillbault to discuss his paper, " Comparing Colorseeing, Colorblind, Painters, and Large Language Models in Their Processing of Color Metaphors ". The results have interesting implications for how we model human cognition, and in turn, how the concept of synaesthesia could be integrated to develop more intelligent AI models. A color metaphor is the use of color to describe something in a way that is not immediately literal. For example, to say "green with envy" would be a color metaphor, because envy doesn't have an immediate visual structure to it - we're evoking a broader, more flexible notion of what green conveys, beyond just its visible properties. What makes metaphors very interesting is that they often use past experience or cultural associations in new ways to talk about something beyond our current perception - either something imagined or in the future, which are many steps of abstraction away from the present. Metaphors provide an alternative pathway to get there.


Interview with AAAI Fellow Sanmay Das: multiagent systems

AIHub

Each year the AAAI recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. We're talking to some of the 2026 AAAI Fellows to find out more about their work. In this interview, we chat to Sanmay Das, who was elected as a Fellow . Could you start with a quick introduction, where you work, and your general area of research? Broadly speaking, I work in multiagent systems. I've done a lot of work at the intersection of AI and economics, and over the last decade or so I've thought a lot about projects in the AI for social impact and social good space. In particular, my interest has been in the allocation of scarce societal resources, thinking about how AI can be integrated, and what it tells us about systems where we don't necessarily want full free market resource allocation.