Goto

Collaborating Authors

 duplication







On the de-duplication of the Lakh MIDI dataset

Choi, Eunjin, Kim, Hyerin, Ryu, Jiwoo, Nam, Juhan, Jeong, Dasaem

arXiv.org Artificial Intelligence

A large-scale dataset is essential for training a well-generalized deep-learning model. Most such datasets are collected via scraping from various internet sources, inevitably introducing duplicated data. In the symbolic music domain, these duplicates often come from multiple user arrangements and metadata changes after simple editing. However, despite critical issues such as unreliable training evaluation from data leakage during random splitting, dataset duplication has not been extensively addressed in the MIR community. This study investigates the dataset duplication issues regarding Lakh MIDI Dataset (LMD), one of the largest publicly available sources in the symbolic music domain. To find and evaluate the best retrieval method for duplicated data, we employed the Clean MIDI subset of the LMD as a benchmark test set, in which different versions of the same songs are grouped together. We first evaluated rule-based approaches and previous symbolic music retrieval models for de-duplication and also investigated with a contrastive learning-based BERT model with various augmentations to find duplicate files. As a result, we propose three different versions of the filtered list of LMD, which filters out at least 38,134 samples in the most conservative settings among 178,561 files.


Computing-In-Memory Dataflow for Minimal Buffer Traffic

Song, Choongseok, Jeong, Doo Seok

arXiv.org Artificial Intelligence

--Computing-In-Memory (CIM) offers a potential solution to the memory wall issue and can achieve high energy efficiency by minimizing data movement, making it a promising architecture for edge AI devices. Lightweight models like MobileNet and EfficientNet, which utilize depthwise convolution for feature extraction, have been developed for these devices. However, CIM macros often face challenges in accelerating depth-wise convolution, including underutilization of CIM memory and heavy buffer traffic. The latter, in particular, has been overlooked despite its significant impact on latency and energy consumption. T o address this, we introduce a novel CIM dataflow that significantly reduces buffer traffic by maximizing data reuse and improving memory utilization during depthwise convolution. The proposed dataflow is grounded in solid theoretical principles, fully demonstrated in this paper . When applied to MobileNet and EfficientNet models, our dataflow reduces buffer traffic by 77.4-87.0%, Convolutional neural networks (CNNs) have achieved remarkable success in computer vision, excelling in spatial feature extraction [1].


[ Submission 1194: " DISK" ] We thank all reviewers for their insightful comments, and address their concerns

Neural Information Processing Systems

R1: DISK is based on previous work (U-Net, SuperPoint) and only offers moderate innovation. We will clarify this in the paper. We tuned inference parameters (NMS window & RANSAC settings) by search, as described in L194-197. R1, R3, R5: What is the contribution of individual components of the pipeline? Experimentally, we observe that 19.9% of features from grid selection This has three potential downsides.


Video Forgery Detection for Surveillance Cameras: A Review

Tayfor, Noor B., Rashid, Tarik A., Qader, Shko M., Hassan, Bryar A., Abdalla, Mohammed H., Majidpour, Jafar, Ahmed, Aram M., Ali, Hussein M., Aladdin, Aso M., Abdullah, Abdulhady A., Shamsaldin, Ahmed S., Sidqi, Haval M., Salih, Abdulrahman, Yaseen, Zaher M., Ameen, Azad A., Nayak, Janmenjoy, Hamza, Mahmood Yashar

arXiv.org Artificial Intelligence

The widespread availability of video recording through smartphones and digital devices has made video-based evidence more accessible than ever. Surveillance footage plays a crucial role in security, law enforcement, and judicial processes. However, with the rise of advanced video editing tools, tampering with digital recordings has become increasingly easy, raising concerns about their authenticity. Ensuring the integrity of surveillance videos is essential, as manipulated footage can lead to misinformation and undermine judicial decisions. This paper provides a comprehensive review of existing forensic techniques used to detect video forgery, focusing on their effectiveness in verifying the authenticity of surveillance recordings. Various methods, including compression-based analysis, frame duplication detection, and machine learning-based approaches, are explored. The findings highlight the growing necessity for more robust forensic techniques to counteract evolving forgery methods. Strengthening video forensic capabilities will ensure that surveillance recordings remain credible and admissible as legal evidence.


One-Pass to Reason: Token Duplication and Block-Sparse Mask for Efficient Fine-Tuning on Multi-Turn Reasoning

Goru, Ritesh, Mehta, Shanay, Jain, Prateek

arXiv.org Artificial Intelligence

Fine-tuning Large Language Models (LLMs) on multi-turn reasoning datasets requires N (number of turns) separate forward passes per conversation due to reasoning token visibility constraints, as reasoning tokens for a turn are discarded in subsequent turns. We propose duplicating response tokens along with a custom attention mask to enable single-pass processing of entire conversations. We prove our method produces identical losses to the N-pass approach while reducing time complexity from $O\bigl(N^{3}\bigl)$ to $O\bigl(N^{2}\bigl)$ and maintaining the same memory complexity for a transformer based model. Our approach achieves significant training speedup while preserving accuracy. Our implementation is available online (https://github.com/devrev/One-Pass-to-Reason).