identification accuracy
DEAL: Diffusion Evolution Adversarial Learning for Sim-to-Real Transfer
Training Reinforcement Learning (RL) controllers in simulation offers cost-efficiency and safety advantages. However, the resultant policies often suffer significant performance degradation during real-world deployment due to the reality gap. Previous works like System Identification (Sys-Id) have attempted to bridge this discrepancy by improving simulator fidelity, but encounter challenges including the collapse of high-dimensional parameter identification, low identification accuracy, and unstable convergence dynamics. To address these challenges, we propose a novel Sys-Id framework that combines Diffusion Evolution with Adversarial Learning (DEAL) to iteratively infer physical parameters with limited real-world data, which makes the state transitions between simulation and reality as similar as possible. Specifically, our method iteratively refines physical parameters through a dual mechanism: a discriminator network evaluates the similarity of state transitions between parameterized simulations and target environment as fitness guidance, while diffusion evolution adaptively modulates noise prediction and denoising processes to optimize parameter distributions.
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Diffusion Transformers (DiTs) are essential for video generation but suffer from significant latency due to the quadratic complexity of attention. By computing only critical tokens, sparse attention reduces computational costs and offers a promising acceleration approach. However, we identify that existing methods fail to approach optimal generation quality under the same computation budget for two reasons: (1) Inaccurate critical token identification: current methods cluster tokens based on position rather than semantics, leading to imprecise aggregated representations.
FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers
Xi, Sarina, Rao, Vishisht, Payan, Justin, Shah, Nihar B.
The identification and localization of errors is a core task in peer review, yet the exponential growth of scientific output has made it increasingly difficult for human reviewers to reliably detect errors given the limited pool of experts. Recent advances in Large Language Models (LLMs) have sparked interest in their potential to support such evaluation tasks, from academic peer review to automated scientific assessment. However, despite the growing use of LLMs in review systems, their capabilities to pinpoint errors remain underexplored. In this work, we introduce Fault Localization Across Writing in Science (FLAWS), an automated benchmark consisting of 713 paper-error pairs designed to evaluate how effectively LLMs detect errors that undermine key claims in research papers. We construct the benchmark by systematically inserting claim-invalidating errors into peer-reviewed papers using LLMs, paired with an automated evaluation metric that measures whether models can identify and localize these errors. Developing such a benchmark presents unique challenges that we overcome: ensuring that the inserted errors are well-defined, challenging, and relevant to the content of the paper, avoiding artifacts that would make identification trivial, and designing a scalable, automated evaluation metric. On the resulting benchmark, we evaluate five frontier LLMs: Claude Sonnet 4.5, DeepSeek Reasoner v3.1, Gemini 2.5 Pro, GPT 5, and Grok 4. Among these, GPT 5 is the top-performing model, achieving 39.1% identification accuracy when k=10, where k is the number of top-ranked error text candidates generated by the LLM.
Tactile Data Recording System for Clothing with Motion-Controlled Robotic Sliding
Eguchi, Michikuni, Kitagishi, Takekazu, Hiroi, Yuichi, Hiraki, Takefumi
The tactile sensation of clothing is critical to wearer comfort. To reveal physical properties that make clothing comfortable, systematic collection of tactile data during sliding motion is required. We propose a robotic arm-based system for collecting tactile data from intact garments. The system performs stroking measurements with a simulated fingertip while precisely controlling speed and direction, enabling creation of motion-labeled, multimodal tactile databases. Machine learning evaluation showed that including motion-related parameters improved identification accuracy for audio and acceleration data, demonstrating the efficacy of motion-related labels for characterizing clothing tactile sensation. This system provides a scalable, non-destructive method for capturing tactile data of clothing, contributing to future studies on fabric perception and reproduction.
Challenger-Based Combinatorial Bandits for Subcarrier Selection in OFDM Systems
Amiri, Mohsen, Venktesh, V, Magnússon, Sindri
This paper investigates the identification of the top-m user-scheduling sets in multi-user MIMO downlink, which is cast as a combinatorial pure-exploration problem in stochastic linear bandits. Because the action space grows exponentially, exhaustive search is infeasible. We therefore adopt a linear utility model to enable efficient exploration and reliable selection of promising user subsets. We introduce a gap-index framework that maintains a shortlist of current estimates of champion arms (top-m sets) and a rotating shortlist of challenger arms that pose the greatest threat to the champions. This design focuses on measurements that yield the most informative gap-index-based comparisons, resulting in significant reductions in runtime and computation compared to state-of-the-art linear bandit methods, with high identification accuracy. The method also exposes a tunable trade-off between speed and accuracy. Simulations on a realistic OFDM downlink show that shortlist-driven pure exploration makes online, measurement-efficient subcarrier selection practical for AI-enabled communication systems.
Text-Independent Speaker Identification Using Audio Looping With Margin Based Loss Functions
Garcia, Elliot Q C, Vilela, Nicéias Silva, Sacramento, Kátia Pires Nascimento do, Ferreira, Tiago A. E.
Speaker identification has become a crucial component in various applications, including security systems, virtual assistants, and personalized user experiences. In this paper, we investigate the effectiveness of CosFace Loss and ArcFace Loss for text-independent speaker identification using a Convolutional Neural Network architecture based on the VGG16 model, modified to accommodate mel spectrogram inputs of variable sizes generated from the Voxceleb1 dataset. Our approach involves implementing both loss functions to analyze their effects on model accuracy and robustness, where the Softmax loss function was employed as a comparative baseline. Additionally, we examine how the sizes of mel spectrograms and their varying time lengths influence model performance. The experimental results demonstrate superior identification accuracy compared to traditional Softmax loss methods. Furthermore, we discuss the implications of these findings for future research.
A triple-branch network for latent fingerprint enhancement guided by orientation fields and minutiae
Wang, Yurun, Qi, Zerong, Fu, Shujun, Hu, Mingzheng
Latent fingerprint enhancement is a critical step in the process of latent fingerprint identification. Existing deep learning-based enhancement methods still fall short of practical application requirements, particularly in restoring low-quality fingerprint regions. Recognizing that different regions of latent fingerprints require distinct enhancement strategies, we propose a Triple Branch Spatial Fusion Network (TBSFNet), which simultaneously enhances different regions of the image using tailored strategies. Furthermore, to improve the generalization capability of the network, we integrate orientation field and minutiae-related modules into TBSFNet and introduce a Multi-Level Feature Guidance Network (MLFGNet). Experimental results on the MOLF and MUST datasets demonstrate that MLFGNet outperforms existing enhancement algorithms.
When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts
Kim, Jun Seong, Thu, Kyaw Ye, Ismayilzada, Javad, Park, Junyeong, Kim, Eunsu, Ahmad, Huzama, An, Na Min, Thorne, James, Oh, Alice
In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures. Our dataset is publicly available at: https://huggingface.co/datasets/kyawyethu/MixCuBe.
Virtual Reflections on a Dynamic 2D Eye Model Improve Spatial Reference Identification
Krüger, Matti, Oshima, Yutaka, Fang, Yu
The visible orientation of human eyes creates some transparency about people's spatial attention and other mental states. This leads to a dual role for the eyes as a means of sensing and communication. Accordingly, artificial eye models are being explored as communication media in human-machine interaction scenarios. One challenge in the use of eye models for communication consists of resolving spatial reference ambiguities, especially for screen-based models. Here, we introduce an approach for overcoming this challenge through the introduction of reflection-like features that are contingent on artificial eye movements. We conducted a user study with 30 participants in which participants had to use spatial references provided by dynamic eye models to advance in a fast-paced group interaction task. Compared to a non-reflective eye model and a pure reflection mode, their combination in the new approach resulted in a higher identification accuracy and user experience, suggesting a synergistic benefit.