Goto

Collaborating Authors

 Media


Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations

arXiv.org Artificial Intelligence

Learning disentangled representations in sequential data is a key goal in deep learning, with broad applications in vision, audio, and time series. While real-world data involves multiple interacting semantic factors over time, prior work has mostly focused on simpler two-factor static and dynamic settings, primarily because such settings make data collection easier, thereby overlooking the inherently multi-factor nature of real-world data. We introduce the first standardized benchmark for evaluating multi-factor sequential disentanglement across six diverse datasets spanning video, audio, and time series. Our benchmark includes modular tools for dataset integration, model development, and evaluation metrics tailored to multi-factor analysis. We additionally propose a post-hoc Latent Exploration Stage to automatically align latent dimensions with semantic factors, and introduce a Koopman-inspired model that achieves state-of-the-art results. Moreover, we show that Vision-Language Models can automate dataset annotation and serve as zero-shot disentanglement evaluators, removing the need for manual labels and human intervention. Together, these contributions provide a robust and scalable foundation for advancing multi-factor sequential disentanglement. Our code is available on GitHub, and the datasets and trained models are available on Hugging Face.


Automatic Music Sample Identification with Multi-Track Contrastive Learning

arXiv.org Artificial Intelligence

ABSTRACT Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.


Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset

arXiv.org Artificial Intelligence

How can large language models (LLMs) serve users with varying preferences that may conflict across cultural, political, or other dimensions? To advance this challenge, this paper establishes four key results. First, we demonstrate, through a large-scale multilingual human study with representative samples from five countries (N=15,000), that humans exhibit significantly more variation in preferences than the responses of 21 state-of-the-art LLMs. Second, we show that existing methods for preference dataset collection are insufficient for learning the diversity of human preferences even along two of the most salient dimensions of variability in global values, due to the underlying homogeneity of candidate responses. Third, we argue that this motivates the need for negatively-correlated sampling when generating candidate sets, and we show that simple prompt-based techniques for doing so significantly enhance the performance of alignment methods in learning heterogeneous preferences. Fourth, based on this novel candidate sampling approach, we collect and open-source Community Alignment, the largest and most representative multilingual and multi-turn preference dataset to date, featuring almost 200,000 comparisons from annotators spanning five countries. We hope that the Community Alignment dataset will be a valuable resource for improving the effectiveness of LLMs for a diverse global population.


Tesla revives 'Mad Max' mode in Full Self-Driving

FOX News

Tesla brings back Mad Max mode in its Full Self-Driving system update, allowing more aggressive driving amid ongoing regulatory investigations.


Cancer cures could be in reach with cutting-edge medical tech, doctor predicts

FOX News

Fox News senior medical analyst Dr. Marc Siegel predicts that artificial intelligence will help cure cancer within five to 10 years through early detection and personalized treatments.


Your eyes can only handle so much HDTV

Popular Science

More pixels doesn't always mean a better screen. Breakthroughs, discoveries, and DIY tips sent every weekday. Every year, tech and television companies boast their products' latest and greatest, highest-resolution displays. The 4K display--a screen with a horizontal display of approximately 4,000 pixels-- first became widely available around 2014. Barely a decade later, you can purchase a TV with double the resolution .


Half of all uncontacted Indigenous tribes may disappear by 2036

Popular Science

Survival International's new report illustrates the dangers they face--and their resilience. This photo of an Awa Guajรก couple was taken only five days before their first contact with outsiders in 1992. Breakthroughs, discoveries, and DIY tips sent every weekday. Half of the world's remaining uncontacted Indigenous groups may disappear within a decade without concerted conservation efforts . The dire assessment is detailed in a new report published on October 27 by the nonprofit advocacy group Survival International, and is based on years of field research, interviews, and information gathering expeditions.


Jennifer Lawrence Goes Dark

The New Yorker

She has been cast in maternal roles since her teens. Now, playing a mother for the first time since becoming one, she has chosen the part of a woman pushed past the edge of sanity. In "Die My Love," Lawrence, as Grace, vibrates with boredom and fury. The novel "Die, My Love," by the Argentinean writer Ariana Harwicz, is narrated by a wife and new mother who is living in rural France and seems to be losing her mind. Motherhood has inserted an immersion blender into her psyche: lust, repulsion, pleasure, and doom swirl into a single mess. She calls herself a "sodomising rodent" with "bullet-wounds for eyes," and thinks, "When I masturbate I desecrate crypts, and when I rock my baby I say amen, and when I smile I unplug an iron lung." One night, standing in the cold, staring at her family through a sliding door, she thinks, "I'll stop trying to draw blood from a stone. I'll contain my madness, I'll use the bathroom. I'll put my baby to sleep, jerk off my man and postpone my rebellion in favor of a better life." Martin Scorsese saw a brief review of the novel in the some years ago and decided to pick up a copy. He found it to be a "powerful mosaic of the mind," he told me recently. Scorsese is a member of a book club of sorts, with a few other filmmakers, who read with an eye toward adaptation. For "Die, My Love," he imagined casting Jennifer Lawrence in the lead. He'd been amazed by her performance in Darren Aronofsky's bewildering 2017 fantasia, "Mother!" In that surreal film--it's like an allegory set inside an oil painting--Lawrence plays a woman living with her poet husband in an old farmhouse, which is gradually, then apocalyptically, invaded by strangers. "She really is feeling everything that's happening, in what appears to be a dream of some kind," Scorsese said. He and Lawrence had discussed adaptations before. They considered "The Awakening," Kate Chopin's 1899 novel of female liberation, which ends with the protagonist, Edna Pontellier, walking into the sea. "Die, My Love" was like "The Awakening" if it began with Edna already underwater.


Some People Can't See Mental Images. The Consequences Are Profound

The New Yorker

Ebeyer published posts about famous people who had realized that they were aphantasic: Glen Keane, one of the leading Disney animators on "The Little Mermaid" and "Beauty and the Beast"; John Green, the author of "The Fault in Our Stars," whose books had sold more than fifty million copies; J. Craig Venter, the biologist who led the first team to sequence the human genome; Blake Ross, who co-created the Mozilla-Firefox web browser when he was nineteen. Ebeyer also wanted the Aphantasia Network to be a place where aphantasics could find recent scientific research. For instance, estimating the strength of a person's imagery had been thoroughly subjective until Joel Pearson, a cognitive neuroscientist at the University of New South Wales, in Australia, devised tests to measure it more precisely. In a paper from 2022, Pearson reported that when people with imagery visualized a bright object their pupils contracted, as though they were seeing a bright object in real life, but the pupils of aphantasics imagining a bright object stayed the same. Another study of his had shown that, although aphantasics had the same fear response (sweating) as typical imagers to a frightening image shown on a screen, when exposed to a frightening story they barely responded at all.


The Argument for Letting AI Burn It All Down

WIRED

When the AI bubble bursts, the nerds will do their best work. Suddenly, and not long ago, our dearest tech industry leaders began to suggest caution. Sam Altman said that AI is in a bubble "for sure," albeit one formed around "a kernel of truth." Mark Zuckerberg said an AI bubble "is quite possible," though "if the models keep on growing in capability year over year and demand keeps growing, then maybe there is no collapse, or something." Even Eric Schmidt is saying to calm down about artificial general intelligence and focus on competing with China .