AITopics | Sensing and Signal Processing

Collaborating Authors

Sensing and Signal Processing

News Overviews Instructional Materials AI-Alerts Classics

Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

Neural Information Processing SystemsJun-2-2025, 16:04:09 GMT

Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (1.00)
Transportation (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control Zhenhang Li1,3 Dongbao Yang 1,3

Neural Information Processing SystemsJun-2-2025, 15:49:28 GMT

Centred on content modification and style preservation, Scene Text Editing (STE) remains a challenging task despite considerable progress in text-to-image synthesis and text-driven image manipulation recently. GAN-based STE methods generally encounter a common issue of model generalization, while Diffusion-based STE methods suffer from undesired style deviations. To address these problems, we propose TextCtrl, a diffusion-based method that edits text with prior guidance control. Our method consists of two key components: (i) By constructing finegrained text style disentanglement and robust text glyph structure representation, TextCtrl explicitly incorporates Style-Structure guidance into model design and network training, significantly improving text style consistency and rendering accuracy.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.15)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models Shangquan Sun 1,2 Hyunhee Park 6

Neural Information Processing SystemsJun-2-2025, 15:38:04 GMT

Image restoration has experienced significant advancements due to the development of deep learning. Nevertheless, it encounters challenges related to ill-posed problems, resulting in deviations between single model predictions and ground-truths. Ensemble learning, as a powerful machine learning technique, aims to address these deviations by combining the predictions of multiple base models. Most existing works adopt ensemble learning during the design of restoration models, while only limited research focuses on the inference-stage ensemble of pre-trained restoration models. Regression-based methods fail to enable efficient inference, leading researchers in academia and industry to prefer averaging as their choice for post-training ensemble.

artificial intelligence, ensemble, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (0.93)
Instructional Material (0.67)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Leveraging Sparsity for Efficient Submodular Data Summarization

Neural Information Processing SystemsJun-2-2025, 15:37:43 GMT

The facility location problem is widely used for summarizing large datasets and has additional applications in sensor placement, image retrieval, and clustering. One difficulty of this problem is that submodular optimization algorithms require the calculation of pairwise benefits for all items in the dataset. This is infeasible for large problems, so recent work proposed to only calculate nearest neighbor benefits. One limitation is that several strong assumptions were invoked to obtain provable approximation guarantees. In this paper we establish that these extra assumptions are not necessary--solving the sparsified problem will be almost optimal under the standard assumptions of the problem.

artificial intelligence, efficient submodular data summarization, leveraging sparsity, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)
Information Technology > Sensing and Signal Processing (0.65)

Add feedback

RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees

Neural Information Processing SystemsJun-2-2025, 15:36:10 GMT

Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This paper introduces a robust and agile plug-and-play watermark detection framework, referred to as RAW. As a departure from existing encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable watermarks directly into the original image data. Subsequently, we employ a classifier that is jointly trained with the watermark to detect the presence of the watermark. The proposed framework is compatible with various generative architectures and supports on-the-fly watermark injection after training. By incorporating state-ofthe-art smoothing techniques, we show that the framework also provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of adversarial attacks targeting watermark removal. Experiments on a diverse range of images generated by state-of-the-art diffusion models demonstrate substantially improved watermark encoding speed and watermark detection performance, under adversarial attacks, while maintaining image quality. Our code is publicly available here.

artificial intelligence, machine learning, watermark, (18 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation

Neural Information Processing SystemsJun-2-2025, 14:56:25 GMT

Segmentation of 3D images is a fundamental problem in biomedical image analysis. Deep learning (DL) approaches have achieved the state-of-the-art segmentation performance. To exploit the 3D contexts using neural networks, known DL segmentation methods, including 3D convolution, 2D convolution on the planes orthogonal to 2D slices, and LSTM in multiple directions, all suffer incompatibility with the highly anisotropic dimensions in common 3D biomedical images. In this paper, we propose a new DL framework for 3D image segmentation, based on a combination of a fully convolutional network (FCN) and a recurrent neural network (RNN), which are responsible for exploiting the intra-slice and inter-slice contexts, respectively. To our best knowledge, this is the first DL framework for 3D image segmentation that explicitly leverages 3D image anisotropism.

artificial intelligence, convolutional and recurrent neural network, machine learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models Tony Lee

Neural Information Processing SystemsJun-2-2025, 14:43:37 GMT

We introduce Image2Struct, a benchmark to evaluate vision-language models (VLMs) on extracting structure from images. Our benchmark 1) captures realworld use cases, 2) is fully automatic and does not require human judgment, and 3) is based on a renewable stream of fresh data. In Image2Struct, VLMs are prompted to generate the underlying structure (e.g., LaTeX code or HTML) from an input image (e.g., webpage screenshot). The structure is then rendered to produce an output image (e.g., rendered webpage), which is compared against the input image to produce a similarity score. This round-trip evaluation allows us to quantitatively evaluate VLMs on tasks with multiple valid structures. We create a pipeline that downloads fresh data from active online communities upon execution and evaluates the VLMs without human intervention. We introduce three domains (Webpages, LaTeX, and Musical Scores) and use five image metrics (pixel similarity, cosine similarity between the Inception vectors, learned perceptual image patch similarity, structural similarity index measure, and earth mover similarity) that allow efficient and automatic comparison between pairs of images. We evaluate Image2Struct on 14 prominent VLMs and find that scores vary widely, indicating that Image2Struct can differentiate between the performances of different VLMs. Additionally, the best score varies considerably across domains (e.g., 0.402 on sheet music vs. 0.830 on LaTeX equations), indicating that Image2Struct contains tasks of varying difficulty.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Government (0.67)
Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

image modalities proposed by Reviewer 1 is an interesting idea, we will consider for future work

Neural Information Processing SystemsJun-2-2025, 14:27:07 GMT

We would like to thank all reviewers for their time and effort writing these valuable reviews. Reviewer 3 mentioned that a performance measure with other recent methods would be beneficial. The code for this paper will be released with the camera-ready version. In the following, we focus on the questions given by Reviewer 2. The presented network does not contain fewer parameters compared to the classical B-spline method for optimization. Furthermore, it is straightforward to extend for the 3D case.

artificial intelligence, machine learning, registration, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.77)
Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback

Visual Pinwheel Centers Act as Geometric Saliency Detectors Mingyi Huang

Neural Information Processing SystemsJun-2-2025, 14:17:05 GMT

During natural evolution, the primary visual cortex (V1) of lower mammals typically forms salt-and-pepper organizations, while higher mammals and primates develop pinwheel structures with distinct topological properties. Despite the general belief that V1 neurons primarily serve as edge detectors, the functional advantages of pinwheel structures over salt-and-peppers are not well recognized. To this end, we propose a two-dimensional self-evolving spiking neural network that integrates Hebbian-like plasticity and empirical morphological data.

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.92)

Add feedback

Robustly overfitting latents for flexible neural image compression

Neural Information Processing SystemsJun-2-2025, 14:12:33 GMT

Neural image compression has made a great deal of progress. State-of-the-art models are based on variational autoencoders and are outperforming classical models. Neural compression models learn to encode an image into a quantized latent representation that can be efficiently sent to the decoder, which decodes the quantized latent into a reconstructed image. While these models have proven successful in practice, they lead to sub-optimal results due to imperfect optimization and limitations in the encoder and decoder capacity. Recent work shows how to use stochastic Gumbel annealing (SGA) to refine the latents of pre-trained neural image compression models.

artificial intelligence, iteration, machine learning, (19 more...)

Neural Information Processing Systems

Country: