AITopics | Sensing and Signal Processing

Collaborating Authors

Sensing and Signal Processing

News Overviews Instructional Materials AI-Alerts Classics

Learning Loss for Test-Time Augmentation

Neural Information Processing SystemsMay-16-2025, 03:22:17 GMT

Data augmentation has been actively studied for robust neural networks. Most of the recent data augmentation methods focus on augmenting datasets during the training phase. At the testing phase, simple transformations are still widely used for test-time augmentation. This paper proposes a novel instance-level testtime augmentation that efficiently selects suitable transformations for a test input. Our proposed method involves an auxiliary module to predict the loss of each possible transformation given the input. Then, the transformations having lower predicted losses are applied to the input.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

1d774c112926348c3e25ea47d87c835b-Paper-Conference.pdf

Neural Information Processing SystemsMay-16-2025, 01:53:57 GMT

data mining, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Sensing and Signal Processing (0.93)

Add feedback

0b68d474baf8dff30f3280c199a32089-Paper-Conference.pdf

Neural Information Processing SystemsMay-16-2025, 01:09:03 GMT

artificial intelligence, input image, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Google AI Mode rolls out to more testers with new image search feature

EngadgetApr-7-2025, 16:00:54 GMT

Google is bringing AI Mode to more people in the US. The company announced on Monday it would make the new search tool, first launched at the start of last month, to millions of more Labs users across the country. For uninitiated, AI Mode is a new dedicated tab within Search. It allows you to ask more complicated questions of Google, with a custom version of Gemini 2.0 doing the legwork to deliver a nuanced AI-generated response. Labs, meanwhile, is a beta program you can enroll your Google account in to gain access to new Search features before the company rolls them out to the public.

ai mode, machine learning, pattern recognition, (7 more...)

Engadget

Country: North America > United States (0.28)

Technology:

Information Technology > Information Management > Search (0.64)
Information Technology > Sensing and Signal Processing > Image Processing (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)

Add feedback

Detection Limits and Statistical Separability of Tree Ring Watermarks in Rectified Flow-based Text-to-Image Generation Models

Umrajkar, Ved, Singh, Aakash Kumar

arXiv.org Machine LearningApr-4-2025

Tree-Ring Watermarking is a significant technique for authenticating AI-generated images. However, its effectiveness in rectified flow-based models remains unexplored, particularly given the inherent challenges of these models with noise latent inversion. Through extensive experimentation, we evaluated and compared the detection and separability of watermarks between SD 2.1 and FLUX.1-dev models. By analyzing various text guidance configurations and augmentation attacks, we demonstrate how inversion limitations affect both watermark recovery and the statistical separation between watermarked and unwatermarked images. Our findings provide valuable insights into the current limitations of Tree-Ring Watermarking in the current SOTA models and highlight the critical need for improved inversion methods to achieve reliable watermark detection and separability. The official implementation, dataset release and all experimental results are available at this \href{https://github.com/dsgiitr/flux-watermarking}{\textbf{link}}.

artificial intelligence, inversion, machine learning, (15 more...)

arXiv.org Machine Learning

2504.0385

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Quantum Generative Models for Image Generation: Insights from MNIST and MedMNIST

Chen, Chi-Sheng, Hou, Wei An, Hu, Hsiang-Wei, Cai, Zhen-Sheng

arXiv.org Artificial IntelligenceApr-3-2025

Quantum generative models offer a promising new direction in machine learning by leveraging quantum circuits to enhance data generation capabilities. In this study, we propose a hybrid quantum-classical image generation framework that integrates variational quantum circuits into a diffusion-based model. To improve training dynamics and generation quality, we introduce two novel noise strategies: intrinsic quantum-generated noise and a tailored noise scheduling mechanism. Our method is built upon a lightweight U-Net architecture, with the quantum layer embedded in the bottleneck module to isolate its effect. We evaluate our model on MNIST and MedMNIST datasets to examine its feasibility and performance. Notably, our results reveal that under limited data conditions (fewer than 100 training images), the quantum-enhanced model generates images with higher perceptual quality and distributional similarity than its classical counterpart using the same architecture. While the quantum model shows advantages on grayscale data such as MNIST, its performance is more nuanced on complex, color-rich datasets like PathMNIST. These findings highlight both the potential and current limitations of quantum generative models and lay the groundwork for future developments in low-resource and biomedical image generation.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2504.00034

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

VinaBench: Benchmark for Faithful and Consistent Visual Narratives

Gao, Silin, Mathew, Sheryl, Mi, Li, Mamooler, Sepideh, Zhao, Mengjie, Wakaki, Hiromi, Mitsufuji, Yuki, Montariol, Syrielle, Bosselut, Antoine

arXiv.org Artificial IntelligenceApr-3-2025

Visual narrative generation transforms textual narratives into sequences of images illustrating the content of the text. However, generating visual narratives that are faithful to the input text and self-consistent across generated images remains an open challenge, due to the lack of knowledge constraints used for planning the stories. In this work, we propose a new benchmark, VinaBench, to address this challenge. Our benchmark annotates the underlying commonsense and discourse constraints in visual narrative samples, offering systematic scaffolds for learning the implicit strategies of visual storytelling. Based on the incorporated narrative constraints, we further propose novel metrics to closely evaluate the consistency of generated narrative images and the alignment of generations with the input textual narrative. Our results across three generative vision models demonstrate that learning with VinaBench's knowledge constraints effectively improves the faithfulness and cohesion of generated visual narratives.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.20871

Genre: Research Report > New Finding (0.48)

Industry: Media (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)

Add feedback

IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration

Boussot, Valentin, Hémon, Cédric, Nunes, Jean-Claude, Downling, Jason, Rouzé, Simon, Lafond, Caroline, Barateau, Anaïs, Dillenseger, Jean-Louis

arXiv.org Artificial IntelligenceApr-3-2025

Image registration is fundamental in medical imaging, enabling precise alignment of anatomical structures for diagnosis, treatment planning, image-guided interventions, and longitudinal monitoring. This work introduces IMPACT (Image Metric with Pretrained model-Agnostic Comparison for Transmodality registration), a novel similarity metric designed for robust multimodal image registration. Rather than relying on raw intensities, handcrafted descriptors, or task-specific training, IMPACT defines a semantic similarity measure based on the comparison of deep features extracted from large-scale pretrained segmentation models. By leveraging representations from models such as TotalSegmentator, Segment Anything (SAM), and other foundation networks, IMPACT provides a task-agnostic, training-free solution that generalizes across imaging modalities. These features, originally trained for segmentation, offer strong spatial correspondence and semantic alignment capabilities, making them naturally suited for registration. The method integrates seamlessly into both algorithmic (Elastix) and learning-based (VoxelMorph) frameworks, leveraging the strengths of each. IMPACT was evaluated on five challenging 3D registration tasks involving thoracic CT/CBCT and pelvic MR/CT datasets. Quantitative metrics, including Target Registration Error and Dice Similarity Coefficient, demonstrated consistent improvements in anatomical alignment over baseline methods. Qualitative analyses further highlighted the robustness of the proposed metric in the presence of noise, artifacts, and modality variations. With its versatility, efficiency, and strong performance across diverse tasks, IMPACT offers a powerful solution for advancing multimodal image registration in both clinical and research settings.

machine learning, pattern recognition, registration, (22 more...)

arXiv.org Artificial Intelligence

2503.24121

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

Towards Mobile Sensing with Event Cameras on High-agility Resource-constrained Devices: A Survey

Wang, Haoyang, Guo, Ruishan, Ma, Pengtao, Ruan, Ciyu, Luo, Xinyu, Ding, Wenhua, Zhong, Tianyang, Xu, Jingao, Liu, Yunhao, Chen, Xinlei

arXiv.org Artificial IntelligenceApr-3-2025

With the increasing complexity of mobile device applications, these devices are evolving toward high agility. This shift imposes new demands on mobile sensing, particularly in terms of achieving high accuracy and low latency. Event-based vision has emerged as a disruptive paradigm, offering high temporal resolution, low latency, and energy efficiency, making it well-suited for high-accuracy and low-latency sensing tasks on high-agility platforms. However, the presence of substantial noisy events, the lack of inherent semantic information, and the large data volume pose significant challenges for event-based data processing on resource-constrained mobile devices. This paper surveys the literature over the period 2014-2024, provides a comprehensive overview of event-based mobile sensing systems, covering fundamental principles, event abstraction methods, algorithmic advancements, hardware and software acceleration strategies. We also discuss key applications of event cameras in mobile sensing, including visual odometry, object tracking, optical flow estimation, and 3D reconstruction, while highlighting the challenges associated with event data processing, sensor fusion, and real-time deployment. Furthermore, we outline future research directions, such as improving event camera hardware with advanced optics, leveraging neuromorphic computing for efficient processing, and integrating bio-inspired algorithms to enhance perception. To support ongoing research, we provide an open-source \textit{Online Sheet} with curated resources and recent developments. We hope this survey serves as a valuable reference, facilitating the adoption of event-based vision across diverse applications.

event camera, machine learning, real time system, (24 more...)

arXiv.org Artificial Intelligence

2503.22943

Country:

Europe (0.92)
Asia > China (0.28)

Genre: Overview (1.00)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Software (0.54)
Transportation > Ground (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Hardware (1.00)
Information Technology > Communications > Networks (1.00)
(6 more...)

Add feedback

Context-Aware Weakly Supervised Image Manipulation Localization with SAM Refinement

Wang, Xinghao, Gong, Tao, Chu, Qi, Liu, Bin, Yu, Nenghai

arXiv.org Artificial IntelligenceMar-31-2025

Malicious image manipulation poses societal risks, increasing the importance of effective image manipulation detection methods. Recent approaches in image manipulation detection have largely been driven by fully supervised approaches, which require labor-intensive pixel-level annotations. Thus, it is essential to explore weakly supervised image manipulation localization methods that only require image-level binary labels for training. However, existing weakly supervised image manipulation methods overlook the importance of edge information for accurate localization, leading to suboptimal localization performance. To address this, we propose a Context-Aware Boundary Localization (CABL) module to aggregate boundary features and learn context-inconsistency for localizing manipulated areas. Furthermore, by leveraging Class Activation Mapping (CAM) and Segment Anything Model (SAM), we introduce the CAM-Guided SAM Refinement (CGSR) module to generate more accurate manipulation localization maps. By integrating two modules, we present a novel weakly supervised framework based on a dual-branch Transformer-CNN architecture. Our method achieves outstanding localization performance across multiple datasets.

artificial intelligence, localization, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2503.20294

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback