Goto

Collaborating Authors

 Media


Is "Six Seven" Really Brain Rot?

The New Yorker

Is "Six Seven" Really Brain Rot? The viral phrase is easy to dismiss, but its ubiquity suggests something crucial about human nature. Recently, my wife was texting with a friend who lives in Singapore. The news from the other side of the world turned out to be that kids there had discovered "six seven." On Halloween, our friend reported, a boy with a handmade "six seven" jersey had earned applause as he made his way through her neighborhood--a place that's a long way from Sixty-seventh Street in Philadelphia, which the rapper Skrilla may have been referencing in his song "Doot Doot (6 7)," which came out last December.


Did Women Really Ruin the Workplace?

The New Yorker

Did Women Really Ruin the Workplace? On Thursday, November 6, the New York published an op-ed video criticizing the effects of feminism on institutions and warning of the dangers of "toxic femininity." It briefly ran with the title "Did Women Ruin the Workplace?" I can answer that question: yes. Specifically, me--I'm the woman who ruined the workplace.


"Sirฤt" Is a Harrowing, Exhilarating Dance of Death

The New Yorker

At one point, Luis assumes that he and Esteban have been abandoned, only to realize, with a start, that their newfound friends are actually circling back to help. In such moments, we grasp the source of the story's mysterious power: a tough-minded understanding that kindness is rare yet persistent, and quite possibly an affront to the laws of nature. "Sirฤt" is a chain of defiantly compassionate acts--noble human improbabilities that take on, in retrospect, an air of fatalistic inevitability. Laxe, a restless wanderer himself, knows Morocco well. He shot his first feature, "You All Are Captains" (2011), in Tangier, where he'd spent several years working at a shelter for disadvantaged children. Several of these children appeared in the movie--a formally playful collision of fiction and documentary in which Laxe, also making an appearance, slyly interrogated his European outsider-artist role. Next came "Mimosas" (2016), an elusive, arrestingly gorgeous drama about a caravan bearing a dying sheikh across Morocco's Atlas Mountains to his homeland. The film had the beauty of a travelogue and the opacity of a parable. Its most dynamic character was a fiery Muslim preacher who warned his fellow-travellers not to stray, geographically or morally.


AI poses threat to journalism in Japan, news association chair says

The Japan Times

The Asahi, along with the Nikkei and the Yomiuri Shimbun, filed a lawsuit with the Tokyo District Court against Perplexity in August. "Journalism should not tolerate freeloading," said Shiro Nakamura, who is also the chair of the Japan Newspaper Publishers & Editors Association (Nihon Shinbun Kyokai, or NSK), during a news conference at the Foreign Correspondents' Club of Japan on Friday. Nakamura said Japan's publishers across the board were concerned about the impact generative AI is having on the news business. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right. With your current subscription plan you can comment on stories.


Actor George Clooney claims the rise of AI technology is dangerous, says 'genie is out of the bottle'

FOX News

Actor George Clooney expressed alarm over artificial intelligence's rise in Hollywood, citing concerns about realistic AI-generated videos and the Sora 2 model in a new Variety interview.


Will AI mean better adverts or 'creepy slop'?

BBC News

Will AI mean better adverts or'creepy slop'? Imagine one night, you're scrolling through social media on your phone, and the ads start to look remarkably familiar. They're decked out in your favourite colours, are featuring your favourite music and the wording sounds like phrases you regularly use. Welcome to the future of advertising, which is already here thanks to AI. Advertising company Cheil UK, for example, has been working with startup Spotlight on using large language AI models to understand people's online activity, and adapt that content based on what the AI interprets an individual's personality to be. The technology can then mirror how someone talks in terms of tone, phrase and pace to change the text of an ad accordingly, and insert music and colours to match, say, whether the AI deems someone to be introverted or extroverted, or have specific preferences for loud or calm music, or light or dark colours.


PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

arXiv.org Artificial Intelligence

Inference-time alignment enables large language models (LLMs) to generate outputs aligned with end-user preferences without further training. Recent post-training methods achieve this by using small guidance models to modify token generation during inference. These methods typically optimize a reward function KL-regularized by the original LLM taken as the reference policy. A critical limitation, however, is their dependence on a pre-trained reward model, which requires fitting to human preference feedback--a potentially unstable process. In contrast, we introduce PITA, a novel framework that integrates preference feedback directly into the LLM's token generation, eliminating the need for a reward model. PITA learns a small preference-based guidance policy to modify token probabilities at inference time without LLM fine-tuning, reducing computational cost and bypassing the pre-trained reward model dependency. The problem is framed as identifying an underlying preference distribution, solved through stochastic search and iterative refinement of the preference-based guidance model. We evaluate PITA across diverse tasks, including mathematical reasoning and sentiment classification, demonstrating its effectiveness in aligning LLM outputs with user preferences.


Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision

arXiv.org Artificial Intelligence

Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing methods fail to simultaneously address inaccurate depth estimation in distant areas and structural degradation in close-range regions. This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision to advance 3D Gaussian Splatting. Our approach comprises two core components: (1) Depth-of-field Supervision employs a scale-recovered monocular depth estimator (e.g., Metric3D) to generate depth priors, leverages defocus convolution to synthesize physically accurate defocused images, and enforces geometric consistency through a novel depth-of-field loss, thereby enhancing depth fidelity in both far-field and near-field regions; (2) Multi-View Consistency Supervision employing LoFTR-based semi-dense feature matching to minimize cross-view geometric errors and enforce depth consistency via least squares optimization of reliable matched points. By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method on the Waymo Open Dataset. This framework bridges physical imaging principles and learning-based depth regularization, offering a scalable solution for complex depth stratification in urban environments.


Music Flamingo: Scaling Music Understanding in Audio Language Models

arXiv.org Artificial Intelligence

We introduce Music Flamingo, a novel large audio-language model designed to advance music (including song) understanding in foundational audio models. While audio-language research has progressed rapidly, music remains challenging due to its dynamic, layered, and information-dense nature. Progress has been further limited by the difficulty of scaling open audio understanding models, primarily because of the scarcity of high-quality music data and annotations. As a result, prior models are restricted to producing short, high-level captions, answering only surface-level questions, and showing limited generalization across diverse musical cultures. To address these challenges, we curate MF-Skills, a large-scale dataset labeled through a multi-stage pipeline that yields rich captions and question-answer pairs covering harmony, structure, timbre, lyrics, and cultural context. We fine-tune an enhanced Audio Flamingo 3 backbone on MF-Skills and further strengthen multiple skills relevant to music understanding. To improve the model's reasoning abilities, we introduce a post-training recipe: we first cold-start with MF-Think, a novel chain-of-thought dataset grounded in music theory, followed by GRPO-based reinforcement learning with custom rewards. Music Flamingo achieves state-of-the-art results across 10+ benchmarks for music understanding and reasoning, establishing itself as a generalist and musically intelligent audio-language model. Beyond strong empirical results, Music Flamingo sets a new standard for advanced music understanding by demonstrating how models can move from surface-level recognition toward layered, human-like perception of songs. We believe this work provides both a benchmark and a foundation for the community to build the next generation of models that engage with music as meaningfully as humans do.


FactGuard: Event-Centric and Commonsense-Guided Fake News Detection

arXiv.org Artificial Intelligence

Fake news detection methods based on writing style have achieved remarkable progress. However, as adversaries increasingly imitate the style of authentic news, the effectiveness of such approaches is gradually diminishing. Recent research has explored incorporating large language models (LLMs) to enhance fake news detection. Yet, despite their transformative potential, LLMs remain an untapped goldmine for fake news detection, with their real-world adoption hampered by shallow functionality exploration, ambiguous usability, and prohibitive inference costs. In this paper, we propose a novel fake news detection framework, dubbed FactGuard, that leverages LLMs to extract event-centric content, thereby reducing the impact of writing style on detection performance. Furthermore, our approach introduces a dynamic usability mechanism that identifies contradictions and ambiguous cases in factual reasoning, adaptively incorporating LLM advice to improve decision reliability. To ensure efficiency and practical deployment, we employ knowledge distillation to derive FactGuard-D, enabling the framework to operate effectively in cold-start and resource-constrained scenarios. Comprehensive experiments on two benchmark datasets demonstrate that our approach consistently outperforms existing methods in both robustness and accuracy, effectively addressing the challenges of style sensitivity and LLM usability in fake news detection.