Media
Intent Factored Generation: Unleashing the Diversity in Your Language Model
Ahmed, Eltayeb, Berdica, Uljad, Elliott, Martha, Horak, Danijela, Foerster, Jakob N.
Obtaining multiple meaningfully diverse, high quality samples from Large Language Models for a fixed prompt remains an open challenge. Current methods for increasing diversity often only operate at the token-level, paraphrasing the same response. This is problematic because it leads to poor exploration on reasoning problems and to unengaging, repetitive conversational agents. To address this we propose Intent Factored Generation (IFG), factorising the sampling process into two stages. First, we sample a semantically dense intent, e.g., a summary or keywords. Second, we sample the final response conditioning on both the original prompt and the intent from the first stage. This allows us to use a higher temperature during the intent step to promote conceptual diversity, and a lower temperature during the final generation to ensure the outputs are coherent and self-consistent. Additionally, we find that prompting the model to explicitly state its intent for each step of the chain-of-thought before generating the step is beneficial for reasoning tasks. We demonstrate our method's effectiveness across a diverse set of tasks. We show this method improves both pass@k and Reinforcement Learning from Verifier Feedback on maths and code tasks. For instruction-tuning, we combine IFG with Direct Preference Optimisation to increase conversational diversity without sacrificing reward. Finally, we achieve higher diversity while maintaining the quality of generations on a general language modelling task, using a new dataset of reader comments and news articles that we collect and open-source. In summary, we present a simple method of increasing the sample diversity of LLMs while maintaining performance. This method can be implemented by changing the prompt and varying the temperature during generation, making it easy to integrate into many algorithms for gains across various applications.
Brian Wilson, musical genius behind the Beach Boys, dies at 82
Brian Wilson, the musical savant who scripted a defining Southern California soundtrack with a run of hit songs with the Beach Boys before being pulled down a rabbit hole of despair and depression when his highly anticipated masterwork was shelved unfinished, has died. Wilson's family announced his death Wednesday morning on Facebook. "We are at a loss for words right now," the post said. "Please respect our privacy at this time as our family is grieving. We realize we are sharing our grief with the world," said the statement, also shared on Instagram and the musician's website. The statement didn't reveal a cause of death. Wilson died more than a year after it was revealed he was diagnosed with dementia and placed under a conservatorship in May 2024.
Disney and Universal sue AI image creator Midjourney, alleging copyright infringement
In their lawsuit, the entertainment giants called Midjourney's popular AI-powered image generator a "bottomless pit of plagiarism" for its alleged reproductions of the studios' best-known characters. The suit, filed in federal court in Los Angeles, claims Midjourney pirated the libraries of the two Hollywood studios, making and distributing without permission "innumerable" copies of their marquee characters such as Darth Vader from Star Wars, Elsa from Frozen, and the Minions from Despicable Me. Midjourney did not immediately respond to a request for comment. Horacio Gutierrez, Disney's chief legal officer, said in a statement: "We are bullish on the promise of AI technology and optimistic about how it can be used responsibly as a tool to further human creativity, but piracy is piracy, and the fact that it's done by an AI company does not make it any less infringing." NBCUniversal's executive vice-president and general counsel, Kim Harris, said the company was suing to "protect the hard work of all the artists whose work entertains and inspires us and the significant investment we make in our content". Instead, the studios argue, Midjourney continued to release new versions of its AI image service that boast higher-quality infringing images.
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Kirichenko, Polina, Ibrahim, Mark, Chaudhuri, Kamalika, Bell, Samuel J.
For Large Language Models (LLMs) to be reliably deployed in both everyday and high-stakes domains, knowing when not to answer is equally critical as answering correctly. Real-world user queries, which can be underspecified, ill-posed, or fundamentally unanswerable, require LLMs to reason about uncertainty and selectively abstain -- i.e., refuse to answer definitively. However, abstention remains understudied, without a systematic evaluation framework for modern LLMs. In this work, we introduce AbstentionBench, a large-scale benchmark for holistically evaluating abstention across 20 diverse datasets, including questions with unknown answers, underspecification, false premises, subjective interpretations, and outdated information. Evaluating 20 frontier LLMs reveals abstention is an unsolved problem, and one where scaling models is of little use. While recent reasoning LLMs have shown impressive results in complex problem solving, surprisingly, we find that reasoning fine-tuning degrades abstention (by $24\%$ on average), even for math and science domains on which reasoning models are explicitly trained. We find that while a carefully crafted system prompt can boost abstention in practice, it does not resolve models' fundamental inability to reason about uncertainty. We release AbstentionBench to foster research into advancing LLM reliability.
Advancing STT for Low-Resource Real-World Speech
D'Intino, Flavio, Hutter, Hans-Peter
Swiss German is a low-resource language represented by diverse dialects that differ significantly from Standard German and from each other, lacking a standardized written form. As a result, transcribing Swiss German involves translating into Standard German. Existing datasets have been collected in controlled environments, yielding effective speech-to-text (STT) models, but these models struggle with spontaneous conversational speech. This paper, therefore, introduces the new SRB-300 dataset, a 300-hour annotated speech corpus featuring real-world long-audio recordings from 39 Swiss German radio and TV stations. It captures spontaneous speech across all major Swiss dialects recorded in various realistic environments and overcomes the limitation of prior sentence-level corpora. We fine-tuned multiple OpenAI Whisper models on the SRB-300 dataset, achieving notable enhancements over previous zero-shot performance metrics. Improvements in word error rate (WER) ranged from 19% to 33%, while BLEU scores increased between 8% and 40%. The best fine-tuned model, large-v3, achieved a WER of 17.1% and a BLEU score of 74.8. This advancement is crucial for developing effective and robust STT systems for Swiss German and other low-resource languages in real-world contexts.
ATI: Any Trajectory Instruction for Controllable Video Generation
Wang, Angtian, Huang, Haibin, Fang, Jacob Zhiyuan, Yang, Yiding, Ma, Chongyang
W e propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion using trajectory-based inputs. In contrast to prior methods that address these motion types through separate modules or task-specific designs, our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models via a lightweight motion injector . Users can specify keypoints and their motion paths to control localized deformations, entire object motion, virtual camera dynamics, or combinations of these. The injected trajectory signals guide the generative process to produce temporally consistent and semantically aligned motion sequences. Our framework demonstrates superior performance across multiple video motion control tasks, including stylized motion effects (e.g., motion brushes), dynamic viewpoint changes, and precise local motion manipulation. Experiments show that our method provides significantly better controllability and visual quality compared to prior approaches and commercial solutions, while remaining broadly compatible with various state-of-the-art video generation backbones.
Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain
Oota, Subba Reddy, Pahwa, Khushbu, Jindal, Prachi, Namburi, Satya Sai Srinath, Singh, Maneesh, Chakraborty, Tanmoy, Raju, Bapi S., Gupta, Manish
Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models in both unimodal and multimodal stimulus settings. More recently, instruction-tuned multimodal models have shown to generate task-specific representations that align strongly with brain activity. However, prior work evaluating the brain alignment of MLLMs has primarily focused on unimodal settings or relied on non-instruction-tuned multimodal models for multimodal stimuli. To address this gap, we investigated brain alignment, that is, measuring the degree of predictivity of neural activity recorded while participants were watching naturalistic movies (video along with audio) with representations derived from MLLMs. We utilized instruction-specific embeddings from six video and two audio instruction-tuned MLLMs. Experiments with 13 video task-specific instructions show that instruction-tuned video MLLMs significantly outperform non-instruction-tuned multimodal (by 15%) and unimodal models (by 20%). Our evaluation of MLLMs for both video and audio tasks using language-guided instructions shows clear disentanglement in task-specific representations from MLLMs, leading to precise differentiation of multimodal functional processing in the brain. We also find that MLLM layers align hierarchically with the brain, with early sensory areas showing strong alignment with early layers, while higher-level visual and language regions align more with middle to late layers. These findings provide clear evidence for the role of task-specific instructions in improving the alignment between brain activity and MLLMs, and open new avenues for mapping joint information processing in both the systems. We make the code publicly available [https://github.com/subbareddy248/mllm_videos].
Can Artificial Intelligence Write Like Borges? An Evaluation Protocol for Spanish Microfiction
Manzanarez, Gerardo Aleman, Arana, Nora de la Cruz, Flores, Jorge Garcia, Medina, Yobany Garcia, Monroy, Raul, Pernelle, Nathalie
Automated story writing has been a subject of study for over 60 years. Large language models can generate narratively consistent and linguistically coherent short fiction texts. Despite these advancements, rigorous assessment of such outputs for literary merit - especially concerning aesthetic qualities - has received scant attention. In this paper, we address the challenge of evaluating AI-generated microfictions and argue that this task requires consideration of literary criteria across various aspects of the text, such as thematic coherence, textual clarity, interpretive depth, and aesthetic quality. To facilitate this, we present GrAImes: an evaluation protocol grounded in literary theory, specifically drawing from a literary perspective, to offer an objective framework for assessing AI-generated microfiction. Furthermore, we report the results of our validation of the evaluation protocol, as answered by both literature experts and literary enthusiasts. This protocol will serve as a foundation for evaluating automatically generated microfictions and assessing their literary value.
AI Chatbots Are Making LA Protest Disinformation Worse
Disinformation about the Los Angeles protests is spreading on social media networks and is being made worse by users turning to AI chatbots like Grok and ChatGPT to perform fact-checking. As residents of the LA area took to the streets in recent days to protest increasingly frequent Immigration and Customs Enforcement (ICE) raids, conservative posters on social media platforms like X and Facebook flooded their feeds with inaccurate information. In addition to well-worn tactics like repurposing old protest footage or clips from video games and movies, posters have claimed that the protesters are little more than paid agitators being directed by shadowy forces--something for which there is no evidence. In the midst of fast-moving and divisive news stories like the LA protests, and as companies like X and Meta have stepped back from moderating the content on their platforms, users have been turning to AI chatbots for answers--which in many cases have been completely inaccurate. On Monday, the San Francisco Chronicle published images of National Guard troops sleeping on floors.
Unstoppable force loses battle with immovable object: Elon bows to Trump
Elon Musk and Donald Trump are no longer friends. Tension between the two exploded into public view in the middle of last week, with each leveling sharp barbs at the other. Four days into the public feud between the world's most powerful person and the world's richest person, though, I declare Musk the loser. An unstoppable force has lost its battle with an immovable object. From my colleagues Hugo Lowell and Andrew Roth: On Thursday, Elon Musk called for Donald Trump's impeachment and mocked his connections to the convicted sex offender Jeffrey Epstein, as the US president threatened to cancel federal contracts and tax subsidies for Musk's companies, in an extraordinary social media feud that erupted between the former allies.