Goto

Collaborating Authors

 Media


NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization

arXiv.org Artificial Intelligence

Summarizing long-form narratives--such as books, movies, and TV scripts--requires capturing intricate plotlines, character interactions, and thematic coherence, a task that remains challenging for existing LLMs. We introduce NexusSum, a multi-agent LLM framework for narrative summarization that processes long-form text through a structured, sequential pipeline--without requiring fine-tuning. Our approach introduces two key innovations: (1) Dialogue-to-Description Transformation: A narrative-specific preprocessing method that standardizes character dialogue and descriptive text into a unified format, improving coherence. (2) Hierarchical Multi-LLM Summarization: A structured summarization pipeline that optimizes chunk processing and controls output length for accurate, high-quality summaries. Our method establishes a new state-of-the-art in narrative summarization, achieving up to a 30.0% improvement in BERTScore (F1) across books, movies, and TV scripts. These results demonstrate the effectiveness of multi-agent LLMs in handling long-form content, offering a scalable approach for structured summarization in diverse storytelling domains.


Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors

arXiv.org Artificial Intelligence

Recent advancements in Generative AI and Large Language Models (LLMs) have enabled the creation of highly realistic synthetic content, raising concerns about the potential for malicious use, such as misinformation and manipulation. Moreover, detecting Machine-Generated Text (MGT) remains challenging due to the lack of robust benchmarks that assess generalization to real-world scenarios. In this work, we present a pipeline to test the resilience of state-of-the-art MGT detectors (e.g., Mage, Radar, LLM-DetectAIve) to linguistically informed adversarial attacks. To challenge the detectors, we fine-tune language models using Direct Preference Optimization (DPO) to shift the MGT style toward human-written text (HWT). This exploits the detectors' reliance on stylistic clues, making new generations more challenging to detect. Additionally, we analyze the linguistic shifts induced by the alignment and which features are used by detectors to detect MGT texts. Our results show that detectors can be easily fooled with relatively few examples, resulting in a significant drop in detection performance. This highlights the importance of improving detection methods and making them robust to unseen in-domain texts.


Leveraging Knowledge Graphs and LLMs for Structured Generation of Misinformation

arXiv.org Artificial Intelligence

The rapid spread of misinformation, further amplified by recent advances in generative AI, poses significant threats to society, impacting public opinion, democratic stability, and national security. Understanding and proactively assessing these threats requires exploring methodologies that enable structured and scalable misinformation generation. In this paper, we propose a novel approach that leverages knowledge graphs (KGs) as structured semantic resources to systematically generate fake triplets. By analyzing the structural properties of KGs, such as the distance between entities and their predicates, we identify plausibly false relationships. These triplets are then used to guide large language models (LLMs) in generating misinformation statements with varying degrees of credibility. By utilizing structured semantic relationships, our deterministic approach produces misinformation inherently challenging for humans to detect, drawing exclusively upon publicly available KGs (e.g., WikiGraphs). Additionally, we investigate the effectiveness of LLMs in distinguishing between genuine and artificially generated misinformation. Our analysis highlights significant limitations in current LLM-based detection methods, underscoring the necessity for enhanced detection strategies and a deeper exploration of inherent biases in generative models.


RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation

arXiv.org Artificial Intelligence

Although multi-agent systems based on large language models show strong capabilities on multiple tasks, they are still limited by high computational overhead, information loss, and robustness. Inspired by ResNet's residual learning, we propose Residual Mixture-of-Agents (RMoA), integrating residual connections to optimize efficiency and reliability. To maximize information utilization from model responses while minimizing computational costs, we innovatively design an embedding-based diversity selection mechanism that greedily selects responses via vector similarity. Furthermore, to mitigate iterative information degradation, we introduce a Residual Extraction Agent to preserve cross-layer incremental information by capturing inter-layer response differences, coupled with a Residual Aggregation Agent for hierarchical information integration. Additionally, we propose an adaptive termination mechanism that dynamically halts processing based on residual convergence, further improving inference efficiency. RMoA achieves state-of-the-art performance on the benchmarks of across alignment, mathematical reasoning, code generation, and multitasking understanding, while significantly reducing computational overhead. Code is available at https://github.com/mindhunter01/RMoA.


Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts

arXiv.org Artificial Intelligence

Accurate modeling of subjective phenomena such as emotion expression requires data annotated with authors' intentions. Commonly such data is collected by asking study participants to donate and label genuine content produced in the real world, or create content fitting particular labels during the study. Asking participants to create content is often simpler to implement and presents fewer risks to participant privacy than data donation. However, it is unclear if and how study-created content may differ from genuine content, and how differences may impact models. We collect study-created and genuine multimodal social media posts labeled for emotion and compare them on several dimensions, including model performance. We find that compared to genuine posts, study-created posts are longer, rely more on their text and less on their images for emotion expression, and focus more on emotion-prototypical events. The samples of participants willing to donate versus create posts are demographically different. Study-created data is valuable to train models that generalize well to genuine data, but realistic effectiveness estimates require genuine data.


MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs

arXiv.org Artificial Intelligence

Large language models and vision-language models (which we jointly call LMs) have transformed NLP and CV, demonstrating remarkable potential across various fields. However, their capabilities in affective analysis (i.e. sentiment analysis and emotion detection) remain underexplored. This gap is largely due to the absence of comprehensive evaluation benchmarks, and the inherent complexity of affective analysis tasks. In this paper, we introduce MMAFFBen, the first extensive open-source benchmark for multilingual multimodal affective analysis. MMAFFBen encompasses text, image, and video modalities across 35 languages, covering four key affective analysis tasks: sentiment polarity, sentiment intensity, emotion classification, and emotion intensity. Moreover, we construct the MMAFFIn dataset for fine-tuning LMs on affective analysis tasks, and further develop MMAFFLM-3b and MMAFFLM-7b based on it. We evaluate various representative LMs, including GPT-4o-mini, providing a systematic comparison of their affective understanding capabilities. This project is available at https://github.com/lzw108/MMAFFBen.


Exploring Societal Concerns and Perceptions of AI: A Thematic Analysis through the Lens of Problem-Seeking

arXiv.org Artificial Intelligence

This study introduces a novel conceptual framework distinguishing problem-seeking from problem-solving to clarify the unique features of human intelligence in contrast to AI. Problem-seeking refers to the embodied, emotionally grounded process by which humans identify and set goals, while problem-solving denotes the execution of strategies aimed at achieving such predefined objectives. The framework emphasizes that while AI excels at efficiency and optimization, it lacks the orientation derived from experiential grounding and the embodiment flexibility intrinsic to human cognition. To empirically explore this distinction, the research analyzes metadata from 157 YouTube videos discussing AI. Conducting a thematic analysis combining qualitative insights with keyword-based quantitative metrics, this mixed-methods approach uncovers recurring themes in public discourse, including privacy, job displacement, misinformation, optimism, and ethical concerns. The results reveal a dual sentiment: public fascination with AI's capabilities coexists with anxiety and skepticism about its societal implications. The discussion critiques the orthogonality thesis, which posits that intelligence is separable from goal content, and instead argues that human intelligence integrates goal-setting and goal-pursuit. It underscores the centrality of embodied cognition in human reasoning and highlights how AI's limitations come from its current reliance on computational processing. The study advocates for enhancing emotional and digital literacy to foster responsible AI engagement. It calls for reframing public discourse to recognize AI as a tool that augments -- rather than replaces -- human intelligence. By positioning problem seeking at the core of cognition and as a critical dimension of intelligence, this research offers new perspectives on ethically aligned and human-centered AI development.


How to reset your terrible streaming recommendations

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. The best streaming services have vast libraries of content, and that's where recommendations can be useful--guiding you towards the movies and shows you're most likely to enjoy, based on what you've already seen. Maybe someone else (a younger member of the family perhaps) has been using your account, and skewed the recommended titles in a direction you don't like. Maybe your recommendations aren't particularly helpful, or maybe you just want a fresh start away from everything you've watched in the past. In those scenarios and others, resetting your recommendations can help--and it's not difficult to do, no matter the streaming services you use.


The Creator of em Succession /em Is Back With a Movie. There's a Reason He Rushed to Make It Right Away.

Slate

Outside an opulent retreat in the mountains of Utah, the world is going to hell. Thanks to disinformation-spreading tools on the world's largest social media platform, people are being executed by bloodthirsty mobs and machine-gunned by their neighbors, politicians assassinated and governments crumbling. But inside Mountainhead, the billionaire tech moguls responsible for the chaos are smoking cigars and shooting the breeze, debating whether the eruption of global chaos is a crisis to be managed or a surge of "creative destruction" that will help usher humanity into a brighter future. If the fictional setting of Mountainhead, the debut feature by Jesse Armstrong, seems a little too close to reality, that's because it's meant to be. The movie, which stars Steve Carell, Jason Schwartzman, Ramy Youssef, and Cory Michael Smith, was conceived, written, cast, shot, edited, and released in about six months, an astonishingly short timeline for any director, let alone a first-timer.


The New Movie From the Creator of em Succession /em Is Less a Satire Than a Documentary

Slate

For the quartet of tech billionaires in Jesse Armstrong's Mountainhead, ideas are so powerful that nothing else seems real. Holed up in a resplendent snowy retreat built by meditation-app developer Hugo Van Yalk (Jason Schwartzman), they're glued to their phones as the outside world is erupting into chaos, thanks in no small part to the wildfire spread of A.I. deepfakes on the social media platform owned by the world's richest man, Venis Parish (Cory Michael Smith). People in Gujarat are being burned alive after being falsely accused of desecrating religious symbols, and Midwestern Americans are machine-gunning each other over minor disagreements, but for these four men, the widespread devastation is in some ways proof of concept that they're as important as they believe themselves to be. And besides, those bodies going up in flames are just images on a tiny screen, so distant they might as well be theoretical. As he trudges through the snow with Randall (Steve Carell), the venture capitalist who serves as the group's self-appointed philosopher king, Venis asks him, "Do you … believe in other people?"