AITopics

BBC NewsApr-2-2025, 04:02:47 GMT

Drone footage shows scale of Greek island flooding

Heavy rainfall caused flashing flooding on Greece's Paros island on Monday, 31 March. Drone footage captures the scale of destruction in the capital, Naousa, with damage to vehicles and authorities working to clear mud from the streets. Schools were closed Monday and authorities urged residents to avoid travel, according to local media. Further heavy rainfall is expected to hit this week.

drone footage show scale, greek island flooding, rainfall

BBC News

Country:

North America > United States (0.40)
Europe > Greece (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.76)

RobuNFR: Evaluating the Robustness of Large Language Models on Non-Functional Requirements Aware Code Generation

Lin, Feng, Kim, Dong Jae, Li, Zhenhao, Yang, Jinqiu, Tse-Hsun, null, Chen, null

When using LLMs to address Non-Functional Requirements (NFRs), developers may behave differently (e.g., expressing the same NFR in different words). Robust LLMs should output consistent results across these variations; however, this aspect remains underexplored. We propose RobuNFR for evaluating the robustness of LLMs in NFR-aware code generation across four NFR dimensions: design, readability, reliability, and performance, using three methodologies: prompt variation, regression testing, and diverse workflows. Our experiments show that RobuNFR reveals robustness issues in the tested LLMs when considering NFRs in code generation. Specifically, under prompt variation, including NFRs leads to a decrease in Pass@1 by up to 39 percent and an increase in the standard deviation from 0.48 to 2.48 compared to the baseline without NFRs (i.e., Function-Only). While incorporating NFRs generally improves overall NFR metrics, it also results in higher prompt sensitivity. In regression settings, some LLMs exhibit differences across versions, with improvements in one aspect (e.g., reduced code smells) often accompanied by regressions in another (e.g., decreased correctness), revealing inconsistencies that challenge their robustness. When varying workflows, the tested LLMs show significantly different NFR-aware code generation capabilities between two workflows: (1) integrating NFRs and functional requirements into the initial prompt and (2) enhancing Function-Only-generated code with the same NFR.

large language model, machine learning, natural language, (17 more...)

2503.22851

Country: North America > Canada (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Montesuma, Eduardo Fernandes

KD$^{2}$M: An unifying framework for feature knowledge distillation

arXiv.org Machine LearningApr-2-2025

Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets' activations (i.e., their features), a process known as \emph{distribution matching}. In this paper, we propose an unifying framework, Knowledge Distillation through Distribution Matching (KD$^{2}$M), which formalizes this strategy. Our contributions are threefold. We i) provide an overview of distribution metrics used in distribution matching, ii) benchmark on computer vision datasets, and iii) derive new theoretical results for KD.

artificial intelligence, distillation, machine learning, (14 more...)

arXiv.org Machine Learning

2504.01757

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Widler, Valeria, Kaminska, Barbara, Martins, Andre C. R., Puga-Gonzalez, Ivan

An Investigation into the Causal Mechanism of Political Opinion Dynamics: A Model of Hierarchical Coarse-Graining with Community-Bounded Social Influence

The increasing polarization in democratic societies is an emergent outcome of political opinion dynamics. Yet, the fundamental mechanisms behind the formation of political opinions, from individual beliefs to collective consensus, remain unknown. Understanding that a causal mechanism must account for both bottom-up and top-down influences, we conceptualize political opinion dynamics as hierarchical coarse-graining, where microscale opinions integrate into a macro-scale state variable. Using the CODA (Continuous Opinions Discrete Actions) model, we simulate Bayesian opinion updating, social identity-based information integration, and migration between social identity groups to represent higher-level connectivity. This results in coarse-graining across micro, meso, and macro levels. Our findings show that higher-level connectivity shapes information integration, yielding three regimes: independent (disconnected, local convergence), parallel (fast, global convergence), and iterative (slow, stepwise convergence). In the iterative regime, low connectivity fosters transient diversity, indicating an informed consensus. In all regimes, time-scale separation leads to downward causation, where agents converge on the aggregate majority choice, driving consensus. Critically, any degree of coherent higher-level information integration can overcome misalignment via global downward causation. The results highlight how emergent properties of the causal mechanism, such as downward causation, are essential for consensus and may inform more precise investigations into polarized political discourse.

agent, alignment, artificial intelligence, (16 more...)

2504.00877

Country:

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.67)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World

Zhou, Zhiyuan, Atreya, Pranav, Tan, You Liang, Pertsch, Karl, Levine, Sergey

Scalable and reproducible policy evaluation has been a long-standing challenge in robot learning. Evaluations are critical to assess progress and build better policies, but evaluation in the real world, especially at a scale that would provide statistically reliable results, is costly in terms of human time and hard to obtain. Evaluation of increasingly generalist robot policies requires an increasingly diverse repertoire of evaluation environments, making the evaluation bottleneck even more pronounced. To make real-world evaluation of robotic policies more practical, we propose AutoEval, a system to autonomously evaluate generalist robot policies around the clock with minimal human intervention. Users interact with AutoEval by submitting evaluation jobs to the AutoEval queue, much like how software jobs are submitted with a cluster scheduling system, and AutoEval will schedule the policies for evaluation within a framework supplying automatic success detection and automatic scene resets. We show that AutoEval can nearly fully eliminate human involvement in the evaluation process, permitting around the clock evaluations, and the evaluation results correspond closely to ground truth evaluations conducted by hand. To facilitate the evaluation of generalist policies in the robotics community, we provide public access to multiple AutoEval scenes in the popular BridgeData robot setup with WidowX robot arms. In the future, we hope that AutoEval scenes can be set up across institutions to form a diverse and distributed evaluation network.

artificial intelligence, autoeval, evaluation, (11 more...)

2503.24278

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.52)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.34)

Patel, Oam, Wang, Jason, Nayak, Nikhil Shivakumar, Srinivas, Suraj, Lakkaraju, Himabindu

Towards Interpretable Soft Prompts

arXiv.org Machine LearningApr-2-2025

Soft prompts have been popularized as a cheap and easy way to improve task-specific LLM performance beyond few-shot prompts. Despite their origin as an automated prompting method, however, soft prompts and other trainable prompts remain a black-box method with no immediately interpretable connections to prompting. We create a novel theoretical framework for evaluating the interpretability of trainable prompts based on two desiderata: faithfulness and scrutability. We find that existing methods do not naturally satisfy our proposed interpretability criterion. Instead, our framework inspires a new direction of trainable prompting methods that explicitly optimizes for interpretability. To this end, we formulate and test new interpretability-oriented objective functions for two state-of-the-art prompt tuners: Hard Prompts Made Easy (PEZ) and RLPrompt. Our experiments with GPT-2 demonstrate a fundamental trade-off between interpretability and the task-performance of the trainable prompt, explicating the hardness of the soft prompt interpretability problem and revealing odd behavior that arises when one optimizes for an interpretability proxy.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2504.02144

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Shen, Wei, Liu, Guanlin, Wu, Zheng, Zhu, Ruofei, Yang, Qingping, Xin, Chao, Yue, Yu, Yan, Lin

Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of prompt-data construction has been overlooked. This paper addresses this gap by exploring data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity. We introduce a hybrid reward system combining reasoning task verifiers (RTV) and a generative reward model (GenRM) to mitigate reward hacking. We also propose a novel prompt-selection method, Pre-PPO, to maintain response diversity and enhance learning effectiveness. Additionally, we find that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance. Experiments across two model sizes validate our methods' effectiveness and scalability. Results show that RTV is most resistant to reward hacking, followed by GenRM with ground truth, and then GenRM with SFT Best-of-N responses. Our strategies enable rapid capture of subtle task-specific distinctions, leading to substantial improvements in overall RLHF performance. This work highlights the importance of careful data construction and provides practical methods to overcome performance barriers in RLHF.

artificial intelligence, deep learning, machine learning, (16 more...)

2503.2223

Country: Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Water & Waste Management > Solid Waste Management (0.94)
Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection

Ma, Zhiming, Wang, Peidong, Huang, Minhua, Wang, Jingpeng, Wu, Kai, Lv, Xiangzhao, Pang, Yachun, Yang, Yin, Tang, Wenjie, Kang, Yuchen

The detection of telecom fraud faces significant challenges due to the lack of high-quality multimodal training data that integrates audio signals with reasoning-oriented textual analysis. To address this gap, we present TeleAntiFraud-28k, the first open-source audio-text slow-thinking dataset specifically designed for automated telecom fraud analysis. Our dataset is constructed through three strategies: (1) Privacy-preserved text-truth sample generation using automatically speech recognition (ASR)-transcribed call recordings (with anonymized original audio), ensuring real-world consistency through text-to-speech (TTS) model regeneration; (2) Semantic enhancement via large language model (LLM)-based self-instruction sampling on authentic ASR outputs to expand scenario coverage; (3) Multi-agent adversarial synthesis that simulates emerging fraud tactics through predefined communication scenarios and fraud typologies. The generated dataset contains 28,511 rigorously processed speech-text pairs, complete with detailed annotations for fraud reasoning. The dataset is divided into three tasks: scenario classification, fraud detection, fraud type classification. Furthermore, we construct TeleAntiFraud-Bench, a standardized evaluation benchmark comprising proportionally sampled instances from the dataset, to facilitate systematic testing of model performance on telecom fraud detection tasks. We also contribute a production-optimized supervised fine-tuning (SFT) model trained on hybrid real/synthetic data, while open-sourcing the data processing framework to enable community-driven dataset expansion. This work establishes a foundational framework for multimodal anti-fraud research while addressing critical challenges in data privacy and scenario diversity. The project will be released at https://github.com/JimmyMa99/TeleAntiFraud.

large language model, machine learning, natural language, (15 more...)

2503.24115

Country: Asia > China > Guangdong Province (0.15)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

arXiv.org Machine LearningApr-2-2025

Comparative Analysis of Deepfake Detection Models: New Approaches and Perspectives

Batista, Matheus Martins

The growing threat posed by deepfake videos, capable of manipulating realities and disseminating misinformation, drives the urgent need for effective detection methods. This work investigates and compares different approaches for identifying deepfakes, focusing on the GenConViT model and its performance relative to other architectures present in the DeepfakeBenchmark. To contextualize the research, the social and legal impacts of deepfakes are addressed, as well as the technical fundamentals of their creation and detection, including digital image processing, machine learning, and artificial neural networks, with emphasis on Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and Transformers. The performance evaluation of the models was conducted using relevant metrics and new datasets established in the literature, such as WildDeep-fake and DeepSpeak, aiming to identify the most effective tools in the battle against misinformation and media manipulation. The obtained results indicated that GenConViT, after fine-tuning, exhibited superior performance in terms of accuracy (93.82%) and generalization capacity, surpassing other architectures in the DeepfakeBenchmark on the DeepSpeak dataset. This study contributes to the advancement of deepfake detection techniques, offering contributions to the development of more robust and effective solutions against the dissemination of false information.

artificial intelligence, machine learning, modelo, (19 more...)

arXiv.org Machine Learning

2504.029

Country: South America > Brazil > Minas Gerais > Itajubá (0.40)

Genre: Research Report (0.40)

Industry:

Media (1.00)
Information Technology > Security & Privacy (0.98)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)