grayscale
A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models
In machine learning, research has traditionally focused on model development, with relatively less attention paid to training data. As model architectures have matured and marginal gains from further refinements diminish, data quality has emerged as a critical factor. However, systematic studies on evaluating and ensuring dataset quality in the image domain remain limited. This study investigates methods for systematically assessing image dataset quality and examines how various image quality factors influence model performance. Using the publicly available and relatively clean CIFAKE dataset, we identify common quality issues and quantify their impact on training. Building on these findings, we develop a pipeline that integrates two community-developed tools, CleanVision and Fastdup. We analyze their underlying mechanisms and introduce several enhancements, including automatic threshold selection to detect problematic images without manual tuning. Experimental results demonstrate that not all quality issues exert the same level of impact. While convolutional neural networks show resilience to certain distortions, they are particularly vulnerable to degradations that obscure critical visual features, such as blurring and severe downscaling. To assess the performance of existing tools and the effectiveness of our proposed enhancements, we formulate the detection of low-quality images as a binary classification task and use the F1 score as the evaluation metric. Our automatic thresholding method improves the F1 score from 0.6794 to 0.9468 under single perturbations and from 0.7447 to 0.8557 under dual perturbations. For near-duplicate detection, our deduplication strategy increases the F1 score from 0.4576 to 0.7928. These results underscore the effectiveness of our workflow and provide a foundation for advancing data quality assessment in image-based machine learning.
- Asia > Taiwan > Takao Province > Kaohsiung (0.04)
- Asia > Middle East > Jordan (0.04)
Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images
The application of data augmentation for deep learning (DL) methods plays an important role in achieving state-of-the-art results in supervised, semi-supervised, and self-supervised image classification. In particular, channel transformations (e.g., solarize, grayscale, brightness adjustments) are integrated into data augmentation pipelines for remote sensing (RS) image classification tasks. However, contradicting beliefs exist about their proper applications to RS images. A common point of critique is that the application of channel augmentation techniques may lead to physically inconsistent spectral data (i.e., pixel signatures). To shed light on the open debate, we propose an approach to estimate whether a channel augmentation technique affects the physical information of RS images. To this end, the proposed approach estimates a score that measures the alignment of a pixel signature within a time series that can be naturally subject to deviations caused by factors such as acquisition conditions or phenological states of vegetation. We compare the scores associated with original and augmented pixel signatures to evaluate the physical consistency. Experimental results on a multi-label image classification task show that channel augmentations yielding a score that exceeds the expected deviation of original pixel signatures can not improve the performance of a baseline model trained without augmentation.
Images that Sound: Composing Images and Sounds on a Single Canvas
Chen, Ziyang, Geng, Daniel, Owens, Andrew
Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/
- North America > United States > Michigan (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- Leisure & Entertainment (1.00)
- Media > Music (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
DUET: 2D Structured and Approximately Equivariant Representations
Suau, Xavier, Danieli, Federico, Keller, T. Anderson, Blaas, Arno, Huang, Chen, Ramapuram, Jason, Busbridge, Dan, Zappella, Luca
Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
TensorFlow for Computer Vision -- How to Train Image Classifier with Artificial Neural Networks
It should be big enough to train a decent image classifier, but not with ANNs. The only problem is -- it's not structured properly for deep learning out of the box. Let's get the library imports out of the way. We'll need quite a few of them, so make sure to have Numpy, Pandas, TensorFlow, PIL, and Scikit-Learn installed: You can't pass an image directly to a Dense layer. A single image is 3-dimensional -- height, width, color channels -- and a Dense layer expects a 1-dimensional input.
Raven's Progressive Matrices Completion with Latent Gaussian Process Priors
Shi, Fan, Li, Bin, Xue, Xiangyang
Abstract reasoning ability is fundamental to human intelligence. It enables humans to uncover relations among abstract concepts and further deduce implicit rules from the relations. As a well-known abstract visual reasoning task, Raven's Progressive Matrices (RPM) are widely used in human IQ tests. Although extensive research has been conducted on RPM solvers with machine intelligence, few studies have considered further advancing the standard answer-selection (classification) problem to a more challenging answer-painting (generating) problem, which can verify whether the model has indeed understood the implicit rules. In this paper we aim to solve the latter one by proposing a deep latent variable model, in which multiple Gaussian processes are employed as priors of latent variables to separately learn underlying abstract concepts from RPMs; thus the proposed model is interpretable in terms of concept-specific latent variables. The latent Gaussian process also provides an effective way of extrapolation for answer painting based on the learned concept-changing rules. We evaluate the proposed model on RPM-like datasets with multiple continuously-changing visual concepts. Experimental results demonstrate that our model requires only few training samples to paint high-quality answers, generate novel RPM panels, and achieve interpretability through concept-specific latent variables.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Coding Blackness: A History of Black Video Game Characters
Black history permeates all facets of our lives--and video games are no exception. From the 8-bit days to the 4k Ray Tracing present, Black video game characters have occupied various positions; from the precarious period of early sports games in the '70s, which included titles like Heavyweight Champ and the nameless grayscale sprites, to Spider-Man: Miles Morales as the poster child for a new gaming generation today, Black representation has come a long way. Similar to other mediums, such as film, music, and literature; Black culture has been, and is, integral to grappling with our collective understanding of video game history. People of color have often been portrayed in popular media as stereotypes and tropes that speak to an underlying structure of racism, patriarchy, heteronormativity, and other forms of systemic oppression. As a Black queer gaymer, the only time I ever saw myself on the screen was through character creation, but that's just cheating in the context of this story.
Introduction to computer vision with openCV [READ]
For a very long time, computer scientists and engineers have been working to make computers perform tasks achievable by humans. Close to achieving this is artificial intelligence. Amongst these, computer vision is one of the most advanced and has had a greater impact for good. So what exactly is computer vision?. Computer vision is simply a branch of computer science that deals with making computers see or perceive the world the way the human eye does.