AITopics

Diffusion models have demonstrated powerful performance in generating high-quality images. A typical example is text-to-image generator like Stable Diffusion. However, their widespread use also poses potential privacy risks. A key concern is membership inference attacks, which attempt to determine whether a particular data sample was used in the model training process. We propose an efficient membership inference attack method against diffusion models. This method is based on the injection of slight noise and the evaluation of the aggregation degree of the noise distribution. The intuition is that the noise prediction patterns of diffusion models for training set samples and non-training set samples exhibit distinguishable differences.Specifically, we suppose that member images exhibit higher aggregation of predicted noise around a certain time step of the diffusion process. In contrast, the predicted noises of non-member images exhibit a more discrete characteristic around the certain time step. Compared with other existing methods, our proposed method requires fewer visits to the target diffusion model. We inject slight noise into the image under test and then determine its membership by analyzing the aggregation degree of the noise distribution predicted by the model. Empirical findings indicate that our method achieves superior performance across multiple datasets. At the same time, our method can also show better attack effects in ASR and AUC when facing large-scale text-to-image diffusion models, proving the scalability of our method.

artificial intelligence, diffusion model, machine learning, (16 more...)

2510.21783

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.94)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Yuan, Liangqi, Deng, Chuhao, Han, Dong-Jun, Hwang, Inseok, Brunswicker, Sabine, Brinton, Christopher G.

Next-Generation LLM for UAV: From Natural Language to Autonomous Flight

Abstract--With the rapid advancement of Large Language Models (LLMs), their capabilities in various automation domains, particularly Unmanned Aerial V ehicle (UA V) operations, have garnered increasing attention. Current research remains predominantly constrained to small-scale UA V applications, with most studies focusing on isolated components such as path planning for toy drones, while lacking comprehensive investigation of medium-and long-range UA V systems in real-world operational contexts. Larger UA V platforms introduce distinct challenges, including stringent requirements for airport-based take-off and landing procedures, adherence to complex regulatory frameworks, and specialized operational capabilities with elevated mission expectations. LV system processes natural language instructions to orchestrate short-, medium-, and long-range UA V missions through five key technical components: (i) LLM-as-Parser for instruction interpretation, (ii) Route Planner for Points of Interest (POI) determination, (iii) Path Planner for waypoint generation, (iv) Control Platform for executable trajectory implementation, and (v) UA V monitoring. We demonstrate the system's feasibility through three representative use cases spanning different operational scales: multi-UA V patrol, multi-POI delivery, and multi-hop relocation. Beyond the current implementation, we establish a five-level automation taxonomy that charts the evolution from current LLM-as-Parser capabilities (Level 1) to fully autonomous LLMas-Autopilot systems (Level 5), identifying technical prerequisites and research challenges at each stage. The rise of Large Language Models (LLMs) has transformed numerous domains, such as mobile services, vehicles, and robotics [1]-[3]. These fields have become increasingly intelligent and user-friendly through LLM integration, enabling command and control through natural language. Equal contribution L. Y uan and C. G. Brinton are with the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA. C. Deng and I. Hwang are with the School of Aeronautics and Astronautics, Purdue University, West Lafayette, IN 47907, USA. Han is with the Department of Computer Science and Engineering, Y onsei University, Seoul, South Korea. E-mail: djh@yonsei.ac.kr S. Brunswicker is with the Polytechnic Institute, Purdue University, West Lafayette, IN 47907, USA. LLMs fulfill diverse roles within these systems. LLM-as-Router can orchestrate task allocation and model selection for human pilots, LLM-as-Agent can execute actions on behalf of humans, and LLM-as-Judge can conduct evaluations in place of human judgment.

information, large language model, machine learning, (20 more...)

2510.21739

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.64)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.64)

Genre: Research Report (0.64)

Industry:

Transportation > Air (1.00)
Information Technology > Robotics & Automation (1.00)
Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

AquaVLM: Improving Underwater Situation Awareness with Mobile Vision Language Models

Tian, Beitong, Zhao, Lingzhi, Chen, Bo, Zheng, Haozhen, Yang, Jingcheng, Wu, Mingyuan, Vasisht, Deepak, Nahrstedt, Klara

Underwater activities like scuba diving enable millions annually to explore marine environments for recreation and scientific research. Maintaining situational awareness and effective communication are essential for diver safety. Traditional underwater communication systems are often bulky and expensive, limiting their accessibility to divers of all levels. While recent systems leverage lightweight smartphones and support text messaging, the messages are predefined and thus restrict context-specific communication. In this paper, we present AquaVLM, a tap-and-send underwater communication system that automatically generates context-aware messages and transmits them using ubiquitous smartphones. Our system features a mobile vision-language model (VLM) fine-tuned on an auto-generated underwater conversation dataset and employs a hierarchical message generation pipeline. We co-design the VLM and transmission, incorporating error-resilient fine-tuning to improve the system's robustness to transmission errors. We develop a VR simulator to enable users to experience AquaVLM in a realistic underwater environment and create a fully functional prototype on the iOS platform for real-world experiments. Both subjective and objective evaluations validate the effectiveness of AquaVLM and highlight its potential for personal underwater communication as well as broader mobile VLM applications.

large language model, machine learning, natural language, (21 more...)

2510.21722

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Government > Military (1.00)
Information Technology (0.93)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Capponi, Agostino, Gliozzo, Alfio, Han, Chunghyun, Lee, Junkyu

DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance

This paper presents a first empirical study of agentic AI as autonomous decision-makers in decentralized governance. Using more than 3K proposals from major protocols, we build an agentic AI voter that interprets proposal contexts, retrieves historical deliberation data, and independently determines its voting position. The agent operates within a realistic financial simulation environment grounded in verifiable blockchain data, implemented through a modular composable program (MCP) workflow that defines data flow and tool usage via Agentics framework. We evaluate how closely the agent's decisions align with the human and token-weighted outcomes, uncovering strong alignments measured by carefully designed evaluation metrics. Our findings demonstrate that agentic AI can augment collective decision-making by producing interpretable, auditable, and empirically grounded signals in realistic DAO governance settings. The study contributes to the design of explainable and economically rigorous AI agents for decentralized financial systems.

artificial intelligence, natural language, proposal, (16 more...)

2510.21117

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Trading (1.00)
Government > Voting & Elections (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Operationalising Extended Cognition: Formal Metrics for Corporate Knowledge and Legal Accountability

Perrier, Elija

Corporate responsibility turns on notions of corporate \textit{mens rea}, traditionally imputed from human agents. Yet these assumptions are under challenge as generative AI increasingly mediates enterprise decision-making. Building on the theory of extended cognition, we argue that in response corporate knowledge may be redefined as a dynamic capability, measurable by the efficiency of its information-access procedures and the validated reliability of their outputs. We develop a formal model that captures epistemic states of corporations deploying sophisticated AI or information systems, introducing a continuous organisational knowledge metric $S_S(φ)$ which integrates a pipeline's computational cost and its statistically validated error rate. We derive a thresholded knowledge predicate $\mathsf{K}_S$ to impute knowledge and a firm-wide epistemic capacity index $\mathcal{K}_{S,t}$ to measure overall capability. We then operationally map these quantitative metrics onto the legal standards of actual knowledge, constructive knowledge, wilful blindness, and recklessness. Our work provides a pathway towards creating measurable and justiciable audit artefacts, that render the corporate mind tractable and accountable in the algorithmic age.

data mining, large language model, machine learning, (19 more...)

2510.16193

Country: North America > United States > California (0.30)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Business Law (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(3 more...)

Kim, Jinwoong, Park, Sangjin

IKNet: Interpretable Stock Price Prediction via Keyword-Guided Integration of News and Technical Indicators

The increasing influence of unstructured external information, such as news articles, on stock prices has attracted growing attention in financial markets. Despite recent advances, most existing newsbased forecasting models represent all articles using sentiment scores or average embeddings that capture the general tone but fail to provide quantitative, context-aware explanations of the impacts of public sentiment on predictions. To address this limitation, we propose an interpretable keyword-guided network (IKNet), which is an explainable forecasting framework that models the semantic association between individual news keywords and stock price movements. The IKNet identifies salient keywords via FinBERTbased contextual analysis, processes each embedding through a separate nonlinear projection layer, and integrates their representations with the time-series data of technical indicators to forecast next-day closing prices. By applying Shapley Additive Explanations the model generates quantifiable and interpretable attributions for the contribution of each keyword to predictions. Empirical evaluations of S&P 500 data from 2015 to 2024 demonstrate that IKNet outperforms baselines, including recurrent neural networks and transformer models, reducing RMSE by up to 32.9% and improving cumulative returns by 18.5%. Moreover, IKNet enhances transparency by offering contextualized explanations of volatility events driven by public sentiment.

data mining, machine learning, natural language, (17 more...)

2510.07661

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance > Trading (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Jung, Yujin J., Tamaki, Eduardo Ryô, Chatterley, Julia, Mitchell, Grant, Dzebo, Semir, Sandoval, Cristóbal, Littvay, Levente, Hawkins, Kirk A.

Populism Meets AI: Advancing Populism Research with LLMs

Measuring the ideational content of populism remains a challenge. Traditional strategies based on textual analysis have been critical for building the field's foundations and providing a valid, objective indicator of populist framing. Yet these approaches are costly, time consuming, and difficult to scale across languages, contexts, and large corpora. Here we present the results from a rubric and anchor guided chain of thought (CoT) prompting approach that mirrors human coder training. By leveraging the Global Populism Database (GPD), a comprehensive dataset of global leaders' speeches annotated for degrees of populism, we replicate the process used to train human coders by prompting the LLM with an adapted version of the same documentation to guide the model's reasoning. We then test multiple proprietary and open weight models by replicating scores in the GPD. Our findings reveal that this domain specific prompting strategy enables the LLM to achieve classification accuracy on par with expert human coders, demonstrating its ability to navigate the nuanced, context sensitive aspects of populism.

large language model, machine learning, natural language, (17 more...)

2510.07458

Country:

North America > United States > California (0.46)
Asia > Middle East > Republic of Türkiye (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pfisterer, Samuel, Grötschla, Florian, Lanzendörfer, Luca A., Yan, Florian, Wattenhofer, Roger

EuroSpeech: A Multilingual Speech Corpus

Recent progress in speech processing has highlighted that high-quality performance across languages requires substantial training data for each individual language. While existing multilingual datasets cover many languages, they often contain insufficient data for most languages. Thus, trained models perform poorly on the majority of the supported languages. Our work addresses this challenge by introducing a scalable pipeline for constructing speech datasets from parliamentary recordings. The proposed pipeline includes robust components for media retrieval and a two-stage alignment algorithm designed to handle non-verbatim transcripts and long-form audio. Applying this pipeline to recordings from 22 European parliaments, we extract over 61k hours of aligned speech segments, achieving substantial per-language coverage with 19 languages exceeding 1k hours and 22 languages exceeding 500 hours of high-quality speech data. We obtain an average 41.8\% reduction in word error rates over baselines when finetuning an existing ASR model on our dataset, demonstrating the usefulness of our approach.

artificial intelligence, machine learning, natural language, (18 more...)

2510.00514

Country: Europe > Ukraine (0.28)

Genre:

Research Report (0.51)
Workflow (0.46)

Industry: Government > Regional Government > Europe Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Bhatnagar, Shubhang, Xu, Andy, Tan, Kar-Han, Ahuja, Narendra

LUQ: Layerwise Ultra-Low Bit Quantization for Multimodal Large Language Models

Large Language Models (LLMs) with multimodal capabilities have revolutionized vision-language tasks, but their deployment often requires huge memory and computational resources. While post-training quantization (PTQ) has successfully compressed language models to as low as 1-bit precision without significant performance loss, its effectiveness for multimodal LLMs (MLLMs) remains relatively unexplored. In this paper, we present the first study on ultra-low bit (<4-bit) quantization for multimodal LLMs. Our analysis reveals that multimodal tokens and intermediate layer activations produced by them exhibit significantly higher statistical variance and entropy compared to text tokens, making them less tolerant to ultra-low bit quantization. However, the activation distributions of multimodal tokens varies significantly over different layers, with some layers having lower entropy activation distributions. We empirically show that such layers in these models can better tolerate ultra-low bit quantization. Building on these insights, we propose a novel strategy for MLLM quantization, LUQ: Layerwise Ultra-Low Bit Quantization, which selectively applies ultra-low bit quantization to layers that are more resilient to it. Additionally, we also show that using a mix of multimodal tokens (image and text) for PTQ boosts VQA performance in the ultra-low bit regime. We evaluate our method on LLaVA-1.5 and Qwen-2.5-VL across 9 popular VQA benchmarks. The resulting LUQ models use 40% and 31% less memory than their 4-bit counterparts, respectively, while exhibiting a performance degradation of less than 10% on the MME benchmark.

large language model, machine learning, quantization, (14 more...)

2509.23729

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Government (0.46)
Food & Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Kuramoto Orientation Diffusion Models

Song, Yue, Keller, T. Anderson, Brodjian, Sevan, Miyato, Takeru, Yue, Yisong, Perona, Pietro, Welling, Max

Orientation-rich images, such as fingerprints and textures, often exhibit coherent angular directional patterns that are challenging to model using standard generative approaches based on isotropic Euclidean diffusion. Motivated by the role of phase synchronization in biological systems, we propose a score-based generative model built on periodic domains by leveraging stochastic Kuramoto dynamics in the diffusion process. In neural and physical systems, Kuramoto models capture synchronization phenomena across coupled oscillators -- a behavior that we re-purpose here as an inductive bias for structured image generation. In our framework, the forward process performs \textit{synchronization} among phase variables through globally or locally coupled oscillator interactions and attraction to a global reference phase, gradually collapsing the data into a low-entropy von Mises distribution. The reverse process then performs \textit{desynchronization}, generating diverse patterns by reversing the dynamics with a learned score function. This approach enables structured destruction during forward diffusion and a hierarchical generation process that progressively refines global coherence into fine-scale details. We implement wrapped Gaussian transition kernels and periodicity-aware networks to account for the circular geometry. Our method achieves competitive results on general image benchmarks and significantly improves generation quality on orientation-dense datasets like fingerprints and textures. Ultimately, this work demonstrates the promise of biologically inspired synchronization dynamics as structured priors in generative modeling.

artificial intelligence, diffusion model, machine learning, (13 more...)

2509.15328

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)