Goto

Collaborating Authors

 Media


FReM: A Flexible Reasoning Mechanism for Balancing Quick and Slow Thinking in Long-Context Question Answering

arXiv.org Artificial Intelligence

Long-context question-answering (LCQA) systems have greatly benefited from the powerful reasoning capabilities of large language models (LLMs), which can be categorized into slow and quick reasoning modes. However, both modes have their limitations. Slow thinking generally leans to explore every possible reasoning path, which leads to heavy overthinking and wastes time. Quick thinking usually relies on pattern matching rather than truly understanding the query logic, which misses proper understanding. To address these issues, we propose FReM: Flexible Reasoning Mechanism, a method that adjusts reasoning depth according to the complexity of each question. Specifically, FReM leverages synthetic reference QA examples to provide an explicit chain of thought, enabling efficient handling of simple queries while allowing deeper reasoning for more complex ones. By doing so, FReM helps quick-thinking models move beyond superficial pattern matching and narrows the reasoning space for slow-thinking models to avoid unnecessary exploration. Experiments on seven QA datasets show that FReM improves reasoning accuracy and scalability, particularly for complex multihop questions, indicating its potential to advance LCQA methodologies.


Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs

arXiv.org Artificial Intelligence

Recent advancements in reasoning optimization have greatly enhanced the performance of large language models (LLMs). However, existing work fails to address the complexities of audio-visual scenarios, underscoring the need for further research. In this paper, we introduce AURELIA, a novel actor-critic based audio-visual (AV) reasoning framework that distills structured, step-by-step reasoning into AVLLMs at test time, improving their ability to process complex multi-modal inputs without additional training or fine-tuning. To further advance AVLLM reasoning skills, we present AVReasonBench, a challenging benchmark comprising 4500 audio-visual questions, each paired with detailed step-by-step reasoning. Our benchmark spans six distinct tasks, including AV-GeoIQ, which evaluates AV reasoning combined with geographical and cultural knowledge. Evaluating 18 AVLLMs on AVReasonBench reveals significant limitations in their multi-modal reasoning capabilities. Using AURELIA, we achieve up to a 100% relative improvement, demonstrating its effectiveness. This performance gain highlights the potential of reasoning-enhanced data generation for advancing AVLLMs in real-world applications. Our code and data will be publicly released at: https: //github.com/schowdhury671/aurelia.


Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation

arXiv.org Artificial Intelligence

Our study makes several key contributions to understanding LLM - generated disinformation: By validat ion on broader datasets, our detection methods establish a robust analytical framework for examining real - world disinformation content, confirming both the increasing presence and prevalence of machine - generated texts in disinformation datasets over time. The distribution of LLM - generated content varies significantly across languages and platforms, revealing targeted patterns of misuse rather than uniform effects. This provides empirical validation for previously speculated concerns and unsupported fears ab out increased LLM deployment in disinformation campaigns. Most importantly, our findings underscore the urgent need for continued investigation and improved countermeasures, including enhanced detection methods and credibility assessment systems to preserve information integrity in our evolving digital landscape.


CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining

arXiv.org Artificial Intelligence

--Music similarity retrieval is fundamental for managing and exploring relevant content from large collections in streaming platforms. This paper presents a novel cross-modal contrastive learning framework that leverages the open-ended nature of text descriptions to guide music similarity modeling, addressing the limitations of traditional uni-modal approaches in capturing complex musical relationships. T o overcome the scarcity of high-quality text-music paired data, this paper introduces a dual-source data acquisition approach combining online scraping and LLM-based prompting, where carefully designed prompts leverage LLMs' comprehensive music knowledge to generate contextually rich descriptions. Extensive experiments demonstrate that the proposed framework achieves significant performance improvements over existing benchmarks through objective metrics, subjective evaluations, and real-world A/B testing on the Huawei Music streaming platform. Music similarity retrieval plays an important role in many music information retrieval (MIR) tasks, such as music recommendation [1], personalized playlist generation [2] and background music replacement in video editing [3], [4]. As digital music collections rapidly expand within streaming platforms, accurately identifying similarities between musical pieces has become critical for managing and exploring relevant content from such large collections efficiently.


xAI, Elon Musk's AI company, just purchased X, Elon Musk's social media company

Engadget

Elon Musk's AI company, xAI, has purchased X, according to a post shared by Musk. Besides their similar names and owner, the companies are already connected through xAI's chatbot Grok, which is integrated into X. X was acquired by xAI through an all-stock transaction. "The combination values xAI at 80 billion and X at 33 billion ( 45B less 12B debt)," Musk writes. "xAI and X's futures are intertwined."


Hayao Miyazaki Would Hate You Losers and Your A.I. Slop

Slate

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Since OpenAI released an update earlier this week that improved ChatGPT's ability to generate images based on detailed requests, a dark evil has infected the internet, responsible for the shriveling of souls and the wanton destruction of life and nature itself: Studio Ghibli A.I. slop. Social media has been flooded with images of the most random shit imaginable rendered in the signature style of Hayao Miyazaki, the legendary animator and co-founder of the Japanese company Studio Ghibli, renowned for hand-drawn animated films such as Princess Mononoke, Spirited Away, and My Neighbor Totoro. X in particular, Elon Musk's land of the rising bot, is rife with viral posts extolling the virtues of an innovation that steals human-made creations, chews them into paste, and spits out the reassembled remains, stripped of any of the originality, spirit, and labor that makes art art. It's been 24 hours since OpenAI unexpectedly shook the AI image world with 4o image generation.


The Legend of Zelda movie hits theaters on March 26, 2027

Engadget

Nintendo just announced the official release date of the live-action Legend of Zelda movie. It hits theaters on March 26, 2027, which is just about two years from now. The film was first announced back in 2023. The company dropped this bombshell on the official Nintendo Today! app that was surprise-released during a recent Direct livestream. The stream promised that the app would be a constant source of news and information. It looks like that promise was not hyperbole.


Copyright questions loom as ChatGPT's Ghibli-style images go viral

The Japan Times

The release of the latest image generator on OpenAI's ChatGPT has triggered a flood of online memes featuring images done in the style of Studio Ghibli, the Japanese studio behind classic animated films like "My Neighbor Totoro" and "Princess Mononoke." Since the release on Wednesday, AI-generated images depicting Studio Ghibli versions of Elon Musk with U.S. President Donald Trump, "The Lord of the Rings," and even a recreation of the Sept. 11 attacks have gone viral across online platforms.


Make Some Noise: Towards LLM audio reasoning and generation using sound tokens

arXiv.org Artificial Intelligence

Integrating audio comprehension and generation into large language models (LLMs) remains challenging due to the continuous nature of audio and the resulting high sampling rates. Here, we introduce a novel approach that combines Variational Quantization with Conditional Flow Matching to convert audio into ultra-low bitrate discrete tokens of 0.23kpbs, allowing for seamless integration with text tokens in LLMs. We fine-tuned a pretrained text-based LLM using Low-Rank Adaptation (LoRA) to assess its effectiveness in achieving true multimodal capabilities, i.e., audio comprehension and generation. Our tokenizer outperforms a traditional VQ-VAE across various datasets with diverse acoustic events. Despite the substantial loss of fine-grained details through audio tokenization, our multimodal LLM trained with discrete tokens achieves competitive results in audio comprehension with state-of-the-art methods, though audio generation is poor. Our results highlight the need for larger, more diverse datasets and improved evaluation metrics to advance multimodal LLM performance.


A Framework for Cryptographic Verifiability of End-to-End AI Pipelines

arXiv.org Artificial Intelligence

The increasing integration of Artificial Intelligence across multiple industry sectors necessitates robust mechanisms for ensuring transparency, trust, and auditability of its development and deployment. This topic is particularly important in light of recent calls in various jurisdictions to introduce regulation and legislation on AI safety. In this paper, we propose a framework for complete verifiable AI pipelines, identifying key components and analyzing existing cryptographic approaches that contribute to verifiability across different stages of the AI lifecycle, from data sourcing to training, inference, and unlearning. This framework could be used to combat misinformation by providing cryptographic proofs alongside AI-generated assets to allow downstream verification of their provenance and correctness. Our findings underscore the importance of ongoing research to develop cryptographic tools that are not only efficient for isolated AI processes, but that are efficiently `linkable' across different processes within the AI pipeline, to support the development of end-to-end verifiable AI technologies.