Media
Physicists can't explain mysterious radio wave emissions in Antarctica
Breakthroughs, discoveries, and DIY tips sent every weekday. For nearly two decades, balloons carrying highly sensitive atmospheric instruments have drifted more than 25 miles above one of the world's most remote regions. The floating array is the Antarctic Impulsive Transient Antenna (ANITA) experiment, a project overseen by an international group of researchers tasked with measuring some of the universe's oldest and hardest-to-detect cosmic rays. Specifically, the team is hunting for neutrinos--particles with no charge that also possess the smallest known subatomic mass. But according to their recent report, ANITA has repeatedly picked up some truly weird signals that defy explanation.
Even robots get stage fright! Watch the horrifying moment a robot dog COLLAPSES on stage during America's Got Talent audition
Five of the 35kg robots showed off their moves to Queen's Don't Stop Me Now in an incredible feat of engineering. But it turns out that even robots can get stage fright, as one of the dancing bots collapsed just minutes into the performance. On social media, commenters joked that the robot must have been'tired of all the rehearsals'. Even after one of their members collapsed, the robotic performers continued to strut and sway across the stage without missing a beat for their entire 90-second routine. And the slip-up didn't hold this unique dance troupe back as the judges swiftly awarded four'yes' votes, sending them through to the next round of the competition. Even the usually surly Simon Cowell couldn't hold back a smile as he said: 'Can I be honest with you?
Configurable Preference Tuning with Rubric-Guided Synthetic Data
Models of human feedback for AI alignment, such as those underpinning Direct Preference Optimization (DPO), often bake in a singular, static set of preferences, limiting adaptability. This paper challenges the assumption of monolithic preferences by introducing Configurable Preference Tuning (CPT), a novel framework for endowing language models with the ability to dynamically adjust their behavior based on explicit, human-interpretable directives. CPT leverages synthetically generated preference data, conditioned on system prompts derived from structured, fine-grained rubrics that define desired attributes like writing style. By fine-tuning with these rubric-guided preferences, the LLM learns to modulate its outputs at inference time in response to the system prompt, without retraining. This approach not only offers fine-grained control but also provides a mechanism for modeling more nuanced and context-dependent human feedback. Several experimental artifacts, such as training code, generated datasets and fine-tuned models are released at https://github.com/vicgalle/configurable-preference-tuning
RoE-FND: A Case-Based Reasoning Approach with Dual Verification for Fake News Detection via LLMs
Yang, Yuzhou, Zhou, Yangming, Zhu, Zhiying, Qian, Zhenxing, Zhang, Xinpeng, Li, Sheng
The proliferation of deceptive content online necessitates robust Fake News Detection (FND) systems. While evidence-based approaches leverage external knowledge to verify claims, existing methods face critical limitations: noisy evidence selection, generalization bottlenecks, and unclear decision-making processes. Recent efforts to harness Large Language Models (LLMs) for FND introduce new challenges, including hallucinated rationales and conclusion bias. To address these issues, we propose \textbf{RoE-FND} (\textbf{\underline{R}}eason \textbf{\underline{o}}n \textbf{\underline{E}}xperiences FND), a framework that reframes evidence-based FND as a logical deduction task by synergizing LLMs with experiential learning. RoE-FND encompasses two stages: (1) \textit{self-reflective knowledge building}, where a knowledge base is curated by analyzing past reasoning errors, namely the exploration stage, and (2) \textit{dynamic criterion retrieval}, which synthesizes task-specific reasoning guidelines from historical cases as experiences during deployment. It further cross-checks rationales against internal experience through a devised dual-channel procedure. Key contributions include: a case-based reasoning framework for FND that addresses multiple existing challenges, a training-free approach enabling adaptation to evolving situations, and empirical validation of the framework's superior generalization and effectiveness over state-of-the-art methods across three datasets.
GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news
Haque, Abdul, Hani, Umm e, Din, Ahmad, Babar, Muhammad, Abbas, Ali, Ullah, Insaf
GraphRAG-Causal introduces an innovative framework that combines graph-based retrieval with large language models to enhance causal reasoning in news analysis. Traditional NLP approaches often struggle with identifying complex, implicit causal links, especially in low-data scenarios. Our approach addresses these challenges by transforming annotated news headlines into structured causal knowledge graphs. It then employs a hybrid retrieval system that merges semantic embeddings with graph-based structural cues leveraging Neo4j to accurately match and retrieve relevant events. The framework is built on a three-stage pipeline: First, during Data Preparation, news sentences are meticulously annotated and converted into causal graphs capturing cause, effect, and trigger relationships. Next, the Graph Retrieval stage stores these graphs along with their embeddings in a Neo4j database and utilizes hybrid Cypher queries to efficiently identify events that share both semantic and structural similarities with a given query. Finally, the LLM Inference stage utilizes these retrieved causal graphs in a few-shot learning setup with XML-based prompting, enabling robust classification and tagging of causal relationships. Experimental evaluations demonstrate that GraphRAG-Causal achieves an impressive F1-score of 82.1% on causal classification using just 20 few-shot examples. This approach significantly boosts accuracy and consistency, making it highly suitable for real-time applications in news reliability assessment, misinformation detection, and policy analysis.
LiLAC: A Lightweight Latent ControlNet for Musical Audio Generation
Text-to-audio diffusion models produce high-quality and diverse music but many, if not most, of the SOTA models lack the fine-grained, time-varying controls essential for music production. ControlNet enables attaching external controls to a pre-trained generative model by cloning and fine-tuning its encoder on new conditionings. However, this approach incurs a large memory footprint and restricts users to a fixed set of controls. We propose a lightweight, modular architecture that considerably reduces parameter count while matching ControlNet in audio quality and condition adherence. Our method offers greater flexibility and significantly lower memory usage, enabling more efficient training and deployment of independent controls. We conduct extensive objective and subjective evaluations and provide numerous audio examples on the accompanying website at https://lightlatentcontrol.github.io
Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey
Zhu, Jiachen, Zhu, Menghui, Rui, Renting, Shan, Rong, Zheng, Congmin, Chen, Bo, Xi, Yunjia, Lin, Jianghao, Liu, Weiwen, Tang, Ruiming, Yu, Yong, Zhang, Weinan
The advent of large language models (LLMs), such as GPT, Gemini, and DeepSeek, has significantly advanced natural language processing, giving rise to sophisticated chatbots capable of diverse language-related tasks. The transition from these traditional LLM chatbots to more advanced AI agents represents a pivotal evolutionary step. However, existing evaluation frameworks often blur the distinctions between LLM chatbots and AI agents, leading to confusion among researchers selecting appropriate benchmarks. To bridge this gap, this paper introduces a systematic analysis of current evaluation approaches, grounded in an evolutionary perspective. We provide a detailed analytical framework that clearly differentiates AI agents from LLM chatbots along five key aspects: complex environment, multi-source instructor, dynamic feedback, multi-modal perception, and advanced capability. Further, we categorize existing evaluation benchmarks based on external environments driving forces, and resulting advanced internal capabilities. For each category, we delineate relevant evaluation attributes, presented comprehensively in practical reference tables. Finally, we synthesize current trends and outline future evaluation methodologies through four critical lenses: environment, agent, evaluator, and metrics. Our findings offer actionable guidance for researchers, facilitating the informed selection and application of benchmarks in AI agent evaluation, thus fostering continued advancement in this rapidly evolving research domain.
Agent Semantics, Semantic Spacetime, and Graphical Reasoning
Semantic Spacetime (SST) is a discrete, graph theoretic'agent' representation of configurations and process phenomena, used for modelling scenarios that include knowledge representations, in the form of labelled directed graphs [1-4]. It enables both qualitative and quantitative interpretations of processes by combining physical and virtual concepts (from physics and information science) into a Promise Theoretic agent model [5]. Promise Theory principles emphasize the autonomy or locality of causal behaviour, so there are clear motivations for modelling phenomena in this way. As a graph theoretical structure, a Semantic Spacetime is a collection of nodes (agents) joined by links (channels for process information), both of which may have annotations and numerical values associated with them. A key application for Semantic Spacetime in artificial systems is to represent'knowledge' (in its simplified sense) and process structures, such as those normally associated with indexing methods or Semantic Webs, like the triple store approaches of the Resource Description Framework (RDF) [6].
RSCF: Relation-Semantics Consistent Filter for Entity Embedding of Knowledge Graph
Kim, Junsik, Park, Jinwook, Kim, Kangil
In knowledge graph embedding, leveraging relation specific entity transformation has markedly enhanced performance. However, the consistency of embedding differences before and after transformation remains unaddressed, risking the loss of valuable inductive bias inherent in the embeddings. This inconsistency stems from two problems. First, transformation representations are specified for relations in a disconnected manner, allowing dissimilar transformations and corresponding entity embeddings for similar relations. Second, a generalized plug-in approach as a SFBR (Semantic Filter Based on Relations) disrupts this consistency through excessive concentration of entity embeddings under entity-based regularization, generating indistinguishable score distributions among relations. In this paper, we introduce a plug-in KGE method, Relation-Semantics Consistent Filter (RSCF). Its entity transformation has three features for enhancing semantic consistency: 1) shared affine transformation of relation embeddings across all relations, 2) rooted entity transformation that adds an entity embedding to its change represented by the transformed vector, and 3) normalization of the change to prevent scale reduction. To amplify the advantages of consistency that preserve semantics on embeddings, RSCF adds relation transformation and prediction modules for enhancing the semantics. In knowledge graph completion tasks with distance-based and tensor decomposition models, RSCF significantly outperforms state-of-the-art KGE methods, showing robustness across all relations and their frequencies.
Is Your LLM-Based Multi-Agent a Reliable Real-World Planner? Exploring Fraud Detection in Travel Planning
Yao, Junchi, Xu, Jianhua, Xin, Tianyu, Wang, Ziyi, Zhu, Shenzhe, Yang, Shu, Wang, Di
The rise of Large Language Model-based Multi-Agent Planning has leveraged advanced frameworks to enable autonomous and collaborative task execution. Some systems rely on platforms like review sites and social media, which are prone to fraudulent information, such as fake reviews or misleading descriptions. This reliance poses risks, potentially causing financial losses and harming user experiences. To evaluate the risk of planning systems in real-world applications, we introduce \textbf{WandaPlan}, an evaluation environment mirroring real-world data and injected with deceptive content. We assess system performance across three fraud cases: Misinformation Fraud, Team-Coordinated Multi-Person Fraud, and Level-Escalating Multi-Round Fraud. We reveal significant weaknesses in existing frameworks that prioritize task efficiency over data authenticity. At the same time, we validate WandaPlan's generalizability, capable of assessing the risks of real-world open-source planning frameworks. To mitigate the risk of fraud, we propose integrating an anti-fraud agent, providing a solution for reliable planning.