Media
Context Length Alone Hurts LLM Performance Despite Perfect Retrieval
Du, Yufeng, Tian, Minyang, Ronanki, Srikanth, Rongali, Subendhu, Bodapati, Sravan, Galstyan, Aram, Wells, Azton, Schwartz, Roy, Huerta, Eliu A, Peng, Hao
Large language models (LLMs) often fail to scale their performance on long-context tasks performance in line with the context lengths they support. This gap is commonly attributed to retrieval failures -- the models' inability to identify relevant information in the long inputs. Accordingly, recent efforts often focus on evaluating and improving LLMs' retrieval performance: if retrieval is perfect, a model should, in principle, perform just as well on a long input as it does on a short one -- or should it? This paper presents findings that the answer to this question may be negative. Our systematic experiments across 5 open- and closed-source LLMs on math, question answering, and coding tasks reveal that, even when models can perfectly retrieve all relevant information, their performance still degrades substantially (13.9%--85%) as input length increases but remains well within the models' claimed lengths. This failure occurs even when the irrelevant tokens are replaced with minimally distracting whitespace, and, more surprisingly, when they are all masked and the models are forced to attend only to the relevant tokens. A similar performance drop is observed when all relevant evidence is placed immediately before the question. Our findings reveal a previously-unrealized limitation: the sheer length of the input alone can hurt LLM performance, independent of retrieval quality and without any distraction. They motivate our simple, model-agnostic mitigation strategy that transforms a long-context task into a short-context one by prompting the model to recite the retrieved evidence before attempting to solve the problem. On RULER, we observe a consistent improvement of GPT-4o up to 4% on an already strong baseline.
Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents
Xie, Wenda, Guo, Chao, Wang, Yanqing Jing. Junle, Lv, Yisheng, Wang, Fei-Yue
Although LLMs have been widely adopted for creative content generation, a single-pass process often struggles to produce high-quality long narratives. How to effectively revise and improve long narrative scripts like scriptwriters remains a significant challenge, as it demands a comprehensive understanding of the entire context to identify global structural issues and local detailed flaws, as well as coordinating revisions at multiple granularities and locations. Direct modifications by LLMs typically introduce inconsistencies between local edits and the overall narrative requirements. To address these issues, we propose Dramaturge, a task and feature oriented divide-and-conquer approach powered by hierarchical multiple LLM agents. It consists of a Global Review stage to grasp the overall storyline and structural issues, a Scene-level Review stage to pinpoint detailed scene and sentence flaws, and a Hierarchical Coordinated Revision stage that coordinates and integrates structural and detailed improvements throughout the script. The top-down task flow ensures that high-level strategies guide local modifications, maintaining contextual consistency. The review and revision workflow follows a coarse-to-fine iterative process, continuing through multiple rounds until no further substantive improvements can be made. Comprehensive experiments show that Dra-maturge significantly outperforms all baselines in terms of script-level overall quality and scene-level details. Our approach is plug-and-play and can be easily integrated into existing methods to improve the generated scripts.
Supporting Creative Ownership through Deep Learning-Based Music Variation
Krol, Stephen James, Llano, Maria Teresa, McCormack, Jon
This paper investigates the importance of personal ownership in musical AI design, examining how practising musicians can maintain creative control over the compositional process. Through a four-week ecological evaluation, we examined how a music variation tool, reliant on the skill of musicians, functioned within a composition setting. Our findings demonstrate that the dependence of the tool on the musician's ability, to provide a strong initial musical input and to turn moments into complete musical ideas, promoted ownership of both the process and artefact. Qualitative interviews further revealed the importance of this personal ownership, highlighting tensions between technological capability and artistic identity. These findings provide insight into how musical AI can support rather than replace human creativity, highlighting the importance of designing tools that preserve the humanness of musical expression.
FedFlex: Federated Learning for Diverse Netflix Recommendations
Lankester, Sven, Bertoli, Gustavo de Carvalho, Vizcaino, Matias, Aussalet, Emmanuelle Beauxis, Slokom, Manel
The drive for personalization in recommender systems creates a tension between user privacy and the risk of "filter bubbles". Although federated learning offers a promising paradigm for privacy-preserving recommendations, its impact on diversity remains unclear. We introduce FedFlex, a two-stage framework that combines local, on-device fine-tuning of matrix factorization models (SVD and BPR) with a lightweight Maximal Marginal Relevance (MMR) re-ranking step to promote diversity. We conducted the first live user study of a federated recommender, collecting behavioral data and feedback during a two-week online deployment. Our results show that FedFlex successfully engages users, with BPR outperforming SVD in click-through rate. Re-ranking with MMR consistently improved ranking quality (nDCG) across both models, with statistically significant gains, particularly for BPR. Diversity effects varied: MMR increased coverage for both models and improved intra-list diversity for BPR, but slightly reduced it for SVD, suggesting different interactions between personalization and diversification across models. Our exit questionnaire responses indicated that most users expressed no clear preference between re-ranked and unprocessed lists, implying that increased diversity did not substantially reduce user satisfaction.
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Zhang, Yue, Sun, Jilei, Guo, Yunhui, Gogate, Vibhav
Video Large Multimodal Models (VLMMs) have made impressive strides in understanding video content, but they often struggle with abstract and adaptive reasoning-the ability to revise their interpretations when new information emerges. In reality, conclusions are rarely set in stone; additional context can strengthen or weaken an initial inference. To address this, we introduce Defeasible Video Entailment (DVidE), a new task that challenges models to think like doubters, constantly updating their reasoning based on evolving evidence. In DVidE, given a video premise and a textual hypothesis, models must determine whether a new update strengthens or weakens the hypothesis (classification version) or generate a coherent update that modifies the entailment relationship (generation version). For solving the classification task, we propose the Chain of Counterfactual Thought framework, utilizing counterfactual reasoning, ASR-enhanced video content, and rationale refinement to reduce inference bias. For the generation task, we develop a framework that combines ASR output with a Large Language Model (LLM) to produce coherent, contextually relevant updates aligned with the intended strengthener or weakener goals. Additionally, we introduce a novel benchmark dataset, with strengthener/weakener annotations and an LLM-based evaluation metric specifically designed for assessing generative performance. Experimental results demonstrate significant improvements, highlighting our proposed method in enhancing dynamic reasoning capabilities of VLMMs.
Space agency breaks silence on 'foreign' interstellar object soaring past Mars: 'A rare visitor'
Dolly Parton's sister asks for prayers for music icon, 79, amid mystery health battle Bloodcurdling videos shows girl aged 12 subway surfing days before she and friend, 13, died during 3.10am stunt Trump's mass deportation effort removed staggering amount of migrants from US in first year of term: 'Just the beginning' Mom-of-two hospitalized, her son left suicidal and their dog dead... after a simple mistake turns $500K home into a death trap Popular actress shocks fans with'unrecognizable' appearance after suffering heartbreaking tragedy Selena Gomez's'disgusting' habit on her wedding day exposed by eagle-eyed fans despite star's efforts to hide it Charlie Kirk leaked text confirms he was livid about'bullying' Jewish donors: 'I'm leaving pro-Israel cause' She's accused of'murder-for-hire' plot against her famous TV star husband. Now there's a shock twist in the case... and she's forced to stare her demons in the face Man is arrested on terror charges over disturbing Halloween display of fake body bags with town official's titles Space agency breaks silence on'foreign' interstellar object soaring past Mars: 'A rare visitor' 'Disneyland of grocery stores' reveals items most hit by tariffs and the fan favorite it STOPPED buying from China Space agency breaks silence on'foreign' interstellar object soaring past Mars: 'A rare visitor' READ MORE: Mysterious interstellar visitor spotted above Mars appears as'massive cylindrical craft' The European Space Agency (ESA) has finally shared new details about the mysterious interstellar visitor days after its closest approach to Mars . The object, dubbed 3I/ATLAS, came within 18.6 million miles of the Red Planet on October 3, and while NASA quickly uploaded images captured by its Perseverance rover on the Martian surface, ESA had remained quiet until now . The ESA's ExoMars Trace Gas Orbiter (TGO) captured images of the object, appearing as a tiny, blurry white dot in a series of images. The object's icy nucleus and its surrounding halo of gas and dust, called a coma, could not be distinguished separately, but the faint glow was clearly visible against the blackness of space.
Do you have one of these gathering dust in your attic? Experts reveal the forgotten gadgets that could be worth a fortune - including answering machines for landlines
Dolly Parton's sister asks for prayers for music icon, 79, amid mystery health battle Charlie Kirk leaked text confirms he was livid about'bullying' Jewish donors: 'I'm leaving pro-Israel cause' Mom-of-two hospitalized, her son left suicidal and their dog dead... after a simple mistake turns $500K home into a death trap Trump's mass deportation effort removed staggering amount of migrants from US in first year of term: 'Just the beginning' Man is arrested on terror charges over disturbing Halloween display of fake body bags with town official's titles Selena Gomez's'disgusting' habit on her wedding day exposed by eagle-eyed fans despite star's efforts to hide it Popular actress shocks fans with'unrecognizable' appearance after suffering heartbreaking tragedy She's accused of'murder-for-hire' plot against her famous TV star husband. Now there's a shock twist in the case... and she's forced to stare her demons in the face Bloodcurdling videos shows girl aged 12 subway surfing days before she and friend, 13, died during 3.10am stunt'Kissing Trump's a**': President mocks Canada's obsequious PM as he begs for tariff relief Keith Urban's guitarist Maggie once vowed to'never' date a tour mate... as she's accused of charming Nicole Kidman's ex Hollywood's favorite muscle car primed for return as America's No.1 automaker files secret paperwork Do you have one of these gathering dust in your attic? It was only a couple of decades ago that homes and offices were filled with answering machines and BlackBerry phones. And although they've been obsolete for years, they're now among the retro gadgets that could make you a fortune. Brits are sitting on a hidden goldmine of old forgotten tech devices that may be gathering dust in the attic, according to a new report from Gumtree.
'Meteor' streaks through Britain's skies tonight leaving lucky gazers in awe
Charlie Kirk leaked text confirms he was livid about'bullying' Jewish donors: 'I'm leaving pro-Israel cause' White House insider who says WAR with Venezuela is inevitable... as Trump's lethal options are laid out I've seen the real Victoria Beckham... her actions gave me PTSD, she shunned me and even banned me from glancing in her direction. Jimmy Kimmel's audience boom comes crashing down as he loses 71% of viewers in one week'Kissing Trump's a**': President mocks Canada's obsequious PM as he begs for tariff relief World's most invasive predator terrorizing East Coast is delicious and should be eaten to stop its spread, experts say I've had enough of the arrogant and entitled fat brigade. Bloodcurdling videos shows girl aged 12 subway surfing days before she and friend, 13, died during 3.10am stunt Another blow for Prince Harry as African country cuts ties with his'disrespectful' charity Friends fear for new CBS News boss Bari Weiss, claiming her wife thinks she sold out... and her new job will'consume her life' Keith Urban's guitarist Maggie once vowed to'never' date a tour mate... as she's accused of charming Nicole Kidman's ex Hollywood's favorite muscle car primed for return as America's No.1 automaker files secret paperwork AMANDA PLATELL: I never thought I'd feel sorry for Harry. There's one thing he'd do anything to defend... and now Meghan's trampled all over it Ben Affleck's VERY familiar whispers to Jennifer Lopez on the red carpet revealed... as their romantic new era sends fans into overdrive Jimmy Kimmel continues anti-Trump rants and says he's more popular with Americans than the president Brits have been left in awe after spotting what is believed to be a'meteor' glowing through the night sky. Lucky stargazers in Northfields and West Ealing, west London, have reported seeing a blue-ish green blob race through the city's sky tonight.
Top-secret US spy jet spotted circling Russia amid mounting WW3 fears
'Kissing Trump's a**': President mocks Canada's obsequious PM as he begs for tariff relief World's most invasive predator terrorizing East Coast is delicious and should be eaten to stop its spread, experts say Clash of the White House titans: Two of Trump's most powerful lieutenants go to WAR with each other - after vicious leak sent shockwaves AMANDA PLATELL: I never thought I'd feel sorry for Harry. There's one thing he'd do anything to defend... and now Meghan's trampled all over it White House insider who says WAR with Venezuela is inevitable... as Trump's lethal options are laid out Jimmy Kimmel's audience boom comes crashing down as he loses 71% of viewers in one week Lynn put her strange symptoms down to being a busy mum. AOC hit by shockingly crude sex insult by White House after she mocked'TINY' Stephen Miller Friends fear for new CBS News boss Bari Weiss, claiming her wife thinks she sold out... and her new job will'consume her life' Biden ordered CIA cover-up of his'corrupt' business ties to Ukraine, astonishing secret files show We've lost FOURTEEN stone on weight-loss jabs... and it's changed our lives in ways you'd NEVER expect. Jerry Jones slapped with fine by NFL for making rude gesture to fans... but Cowboys owner gives baffling excuse And a humiliating lifeline: Backroom secrets of Taylor Swift and Blake Lively... after hit new song Inside the rise of'kidfluencers' and the hidden toll of turning childhood into million-dollar content A US Air Force jet designed to collect intelligence on enemy radar systems was spotted making circles over Russia, following rising tensions with Moscow . Flight tracking data showed the RC-135U'Combat Sent' taking off from England early Tuesday, flying over the Baltic states and looping around Kaliningrad, the Russian exclave between Poland and Lithuania, before returning to the UK.
A New Movie About George Orwell and em 1984 /em Has a Unique Way of Telling Its Story. It May Haunt You.
Movies Why an Oscar-Nominated Filmmaker Used A.I. to Make His New Documentary Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Sam_Adams newsletter. You can manage your newsletter subscriptions at any time. We encountered an issue signing you up.