AITopics | Vision

Collaborating Authors

Vision

"What exactly is computer vision then? Computer vision is a research field working to equip computers with the ability to process and understand visual data, as sighted humans can. Human brains process the gigabytes of data passing through our eyes every second and translate that data into sight - that is, into discrete objects and entities we can recognise or understand. Similarly, computer vision aims to give computers the ability to understand what they are seeing, and act intelligently on that knowledge."
– Computer vision: Cheat Sheet. ZDNet.com (December 6, 2011), by Natasha Lomas.

News Overviews Instructional Materials AI-Alerts Classics

Supplementary Material: Cross Aggregation Transformer for Image Restoration

Neural Information Processing SystemsMay-23-2025, 04:39:38 GMT

These settings are consistent with CAT-R and CAT-A. For CAT-R-2, we apply regular-Rwin, and set [sw, sh] as [4, 16] (same as CAT-R). We set the MLP expansion ratio as 2, consistent with SwinIR [13]. For CAT-A-2, we apply axial-Rwin, and set sl as 4 for all CATB in each RG. The MLP expansion ratio is set as 4. Best and second best results are colored with red and blue.

artificial intelligence, swinir, urban100, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)

Add feedback

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift

Neural Information Processing SystemsMay-22-2025, 18:21:34 GMT

Public pretraining is a promising approach to improve differentially private model training. However, recent work has noted that many positive research results studying this paradigm only consider in-distribution tasks, and may not apply to settings where there is distribution shift between the pretraining and finetuning data--a scenario that is likely when finetuning private tasks due to the sensitive nature of the data. In this work, we show empirically across three tasks that even in settings with large distribution shift, where both zero-shot performance from public data and training from scratch with private data give unusably weak results, public features can in fact improve private training accuracy by up to 67% over private training from scratch. We provide a theoretical explanation for this phenomenon, showing that if the public and private data share a low-dimensional representation, public representations can improve the sample complexity of private training even if it is impossible to learn the private task from the public data alone. Altogether, our results provide evidence that public data can indeed make private training practical in realistic settings of extreme distribution shift.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(4 more...)

Add feedback

Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network (Supplementary Materials)

Neural Information Processing SystemsMay-22-2025, 10:43:37 GMT

The best results are highlighted by bold. It can be clearly seen that our alternating reverse filtering network performs the best compared with other state-of-the-art methods in all the indexes, indicating the superiority of our proposed method. Images in the last row are the MSE residues between the fused results and the ground truth. Compared with other competing methods, our model has minor spatial and spectral distortions. It can be easily concluded from the observation of MSE maps.

artificial intelligence, image understanding, panchromatic and multispectral image fusion, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.17)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.41)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.41)

Add feedback

ICNet: Intra-saliency Correlation Network for Co-Saliency Detection

Neural Information Processing SystemsMay-22-2025, 09:24:19 GMT

Model-based methods produce coarse Co-SOD results due to hand-crafted intra-and inter-saliency features. Current data-driven models exploit inter-saliency cues, but undervalue the potential power of intra-saliency cues. In this paper, we propose an Intra-saliency Correlation Network (ICNet) to extract intra-saliency cues from the single image saliency maps (SISMs) predicted by any off-the-shelf SOD method, and obtain inter-saliency cues by correlation techniques. Specifically, we adopt normalized masked average pooling (NMAP) to extract latent intra-saliency categories from the SISMs and semantic features as intra cues. Then we employ a correlation fusion module (CFM) to obtain inter cues by exploiting correlations between the intra cues and single-image features. To improve Co-SOD performance, we propose a category-independent rearranged self-correlation feature (RSCF) strategy. Experiments on three benchmarks show that our ICNet outperforms previous state-of-the-art methods on Co-SOD.

artificial intelligence, icnet, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > Canada (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Supplementary Materials of The Image Local Autoregressive Transformer

Neural Information Processing SystemsMay-22-2025, 03:03:22 GMT

We show the combined results in Figure 1.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

The Image Local Autoregressive Transformer

Neural Information Processing SystemsMay-22-2025, 03:03:19 GMT

Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance compared to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model - image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both quantitative and qualitative results show the efficacy of our model.

artificial intelligence, arxiv preprint arxiv, machine learning, (12 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Few-shot Image Generation with Elastic Weight Consolidation Supplementary Material

Neural Information Processing SystemsMay-22-2025, 01:22:00 GMT

In this supplementary material, we present more few-shot generation results evaluated extensively with different artistic domains where there are only a few examples available in practical. The goal is to illustrate the effectiveness of the proposed method in generating diverse high-quality results without being over-fitted to the few given examples. Figure 1 shows the generations of source and target domain by feeding the same latent code into the source and adapted model. It clearly tells that while the adaptation renders new appearance of target domain, other attributes such as the pose, glass and hairstyle, are well inherited and preserved from the source domain. For each target domain, we only use 10 examples for the adaptation and present 100 new results.

artificial intelligence, machine learning, target domain, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.16)

Technology:

Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

Add feedback

SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained model debugging and analysis Roberto Novoa

Neural Information Processing SystemsMay-22-2025, 00:54:16 GMT

However, there are only a few datasets that include concept-level meta-labels and most of these meta-labels are relevant for natural images that do not require domain expertise. Previous densely annotated datasets in medicine focused on meta-labels that are relevant to a single disease such as osteoarthritis or melanoma. In dermatology, skin disease is described using an established clinical lexicon that allows clinicians to describe physical exam findings to one another. To provide a medical dataset densely annotated by domain experts with annotations useful across multiple disease processes, we developed SkinCon: a skin disease dataset densely annotated by dermatologists. SkinCon includes 3230 images from the Fitzpatrick 17k skin disease dataset densely annotated with 48 clinical concepts, 22 of which have at least 50 images representing the concept. The concepts used were chosen by two dermatologists considering the clinical descriptor terms used to describe skin lesions.

artificial intelligence, dataset, machine learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.47)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Data Science (0.68)

Add feedback

a novel constraint optimization method to encode the generic knowledge into a BN without requiring any training data

Neural Information Processing SystemsMay-21-2025, 22:19:22 GMT

Our proposed approach can be applied to other AUs as well. In Tab.6, LP-SM also considers apex frames on CK+, and The comparison to LP-SM is consistent. In Tab.8, we apply FMPN-FER and DeepEmotion to our pre-processed We will consider a pre-trained VGGFace model in our further work. R2 2.1 The novelty compared to prior work. Facial expression can be a group of AUs.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.43)

Add feedback

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

Neural Information Processing SystemsMay-21-2025, 21:46:34 GMT

We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patchbased transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates that ViTs heavily use features that survived such transformations but are generally not indicative of the semantic class to humans. Further investigations show that these features are useful but non-robust, as ViTs trained on them can achieve high in-distribution accuracy, but break down under distribution shifts. From this understanding, we ask: can training the model to rely less on these features improve ViT robustness and out-of-distribution performance? We use the images transformed with our patch-based operations as negatively augmented views and offer losses to regularize the training away from using non-robust features. This is a complementary view to existing research that mostly focuses on augmenting inputs with semantic-preserving transformations to enforce models' invariance. We show that patch-based negative augmentation consistently improves robustness of ViTs on ImageNet based robustness benchmarks across 20+ different experimental settings. Furthermore, we find our patch-based negative augmentation are complementary to traditional (positive) data augmentation techniques and batchbased negative examples in contrastive learning.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.68)

Technology: