Goto

Collaborating Authors

 Large Language Model




Detecting Any Human-Object Interaction Relationship: Universal HOIDetector with Spatial Prompt Learning on Foundation Models

Neural Information Processing Systems

Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting < human,action,object >triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in recognizing interactions within an open world context. This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs). The proposed method is dubbed as UniHOI. We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning. Our design includes an HOPrompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image. Furthermore, we utilize a LLM (i.e.


Large language models transition from integrating across position-yoked, exponential windows to structure-yoked, power-law windows

Neural Information Processing Systems

Modern language models excel at integrating across long temporal scales needed to encode linguistic meaning and show non-trivial similarities to biological neural systems. Prior work suggests that human brain responses to language exhibit hierarchically organized "integration windows" that substantially constrain the overall influence of an input token (e.g., a word) on the neural response. However, little prior work has attempted to use integration windows to characterize computations in large language models (LLMs). We developed a simple word-swap procedure for estimating integration windows from black-box language models that does not depend on access to gradients or knowledge of the model architecture (e.g., attention weights). Using this method, we show that trained LLMs exhibit stereotyped integration windows that are well-fit by a convex combination of an exponential and a power-law function, with a partial transition from exponential to power-law dynamics across network layers. We then introduce a metric for quantifying the extent to which these integration windows vary with structural boundaries (e.g., sentence boundaries), and using this metric, we show that integration windows become increasingly yoked to structure at later network layers. None of these findings were observed in an untrained model, which as expected integrated uniformly across its input. These results suggest that LLMs learn to integrate information in natural language using a stereotyped pattern: integrating across position-yoked, exponential windows at early layers, followed by structure-yoked, power-law windows at later layers. The methods we describe in this paper provide a general-purpose toolkit for understanding temporal integration in language models, facilitating cross-disciplinary research at the intersection of biological and artificial intelligence.


China's DeepSeek unveils latest models a year after upending global tech

Al Jazeera

China's DeepSeek unveils latest models a year after upending global tech China's DeepSeek has unveiled the latest versions of its signature artificial intelligence-powered chatbot, a year after its flagship model sent shockwaves through the global tech scene. The Chinese start-up launched preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash on Friday as it touted its ability to go toe-to-toe with US rivals such as OpenAI and Google. The "flash" model has similar reasoning abilities to the "pro" version, while offering faster response times and more cost-effective pricing, the Hangzhou-based startup said. Like DeepSeek's previous chatbots, V4-Pro and V4-Flash follow an open-source model, meaning developers are free to use and modify them at will. The release comes after DeepSeek-R1 stunned the tech sector upon its launch in January last year with capabilities broadly comparable with those of ChatGPT and Gemini.


Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Neural Information Processing Systems

We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings.


Apple's Next Chapter, SpaceX and Cursor Strike a Deal, and Palantir's Controversial Manifesto

WIRED

In this week's episode of, we talk about Tim Cook's legacy as CEO at Apple and what his long-rumored departure means for the future of one of the world's biggest companies. They also go into the reasoning behind SpaceX and Cursor's surprising deal, and why Palantir's self-published manifesto drew a lot of heat online. Also, we discuss why some conspiracy theorists are leaving Trump's side, and how a scammer created an AI-generated woman to attract and grift MAGA men. Tim Cook's Legacy Is Turning Apple Into a Subscription This Scammer Used an AI-Generated MAGA Girl to Grift'Super Dumb' Men Write to us at [email protected] . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . Zoë, Leah, and I have really enjoyed being your new hosts these past few weeks, and we want to hear from you. If you like the show and have a minute, please leave us a review in the podcast or app of your choice. It really helps us reach more people, and for any questions and comments, you can always reach us at [email protected] . I missed you so much. And I missed you the exact same amount. I'm going to go away more often. Absence makes the heart go fonder, as we all know, and I'm thrilled to be here. This week on the show, we're saying goodbye to Apple CEO, Tim Cook, who announced that he is stepping down from the top gig at the company. And, more than just talking about his legacy at Apple, we'll be looking into what this long-awaited shift actually means for the future of one of the world's biggest companies. We'll also get into why SpaceX and Cursor's potential $60 billion deal announced this week is pretty staggering, and we'll get into Palantir's controversial 22-point manifesto. I feel like manifesto's inherently controversial, otherwise they'd be memos that they posted on X this week.


At 'AI Coachella,' Stanford Students Line Up to Learn From Silicon Valley Royalty

WIRED

CS 153 has gone viral on the Palo Alto campus--and on X. Not everyone is happy about it. As thousands of influencers descended on southern California earlier this month for the annual Coachella Music Festival, a very Silicon Valley program dubbed "AI Coachella" was taking shape a few hundred miles north in Palo Alto. The class, CS 153, is one of Stanford's buzziest offerings this semester, and like the music festival, it features a star-studded lineup of celebrities--in this case, not pop artists, but Big Tech CEOs. The course is co-taught by Anjney Midha, a former Andreessen Horowitz general partner, and Michael Abbott, Apple's former VP of engineering for cloud services.


The Guardian view on Anthropic's Claude Mythos: when AI finds every flaw, who controls the internet? Editorial

The Guardian

'The US government's embrace of Anthropic marks a shift.' 'The US government's embrace of Anthropic marks a shift.' The Guardian view on Anthropic's Claude Mythos: when AI finds every flaw, who controls the internet? A nthropic announced its latest AI model, Claude Mythos, this month but said it would not be released publicly, because it turns computers into crime scenes. The company claimed that it could find previously unknown "zero-day" flaws, exploit them and, in principle, link these weaknesses in order to take over major operating systems and web browsers . Mythos did so autonomously, writing code and obtaining privileges.


The Download: introducing the Nature issue

MIT Technology Review

Plus: Trump signaled he's open to reversing the Anthropic ban. When we talk about "nature," we usually mean something untouched by humans. But little of that world exists today. From microplastics in rainforest wildlife to artificial light in the Arctic Ocean, human influence now reaches every corner of Earth. In this context, what even is nature? And should we employ technology to try to make the world more "natural"?