AITopics | grayscale

Collaborating Authors

grayscale

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Data-Centric Perspective on the Influence of Image Data Quality in Machine Learning Models

Chen, Pei-Han, Chung, Szu-Chi

arXiv.org Artificial IntelligenceSep-30-2025

In machine learning, research has traditionally focused on model development, with relatively less attention paid to training data. As model architectures have matured and marginal gains from further refinements diminish, data quality has emerged as a critical factor. However, systematic studies on evaluating and ensuring dataset quality in the image domain remain limited. This study investigates methods for systematically assessing image dataset quality and examines how various image quality factors influence model performance. Using the publicly available and relatively clean CIFAKE dataset, we identify common quality issues and quantify their impact on training. Building on these findings, we develop a pipeline that integrates two community-developed tools, CleanVision and Fastdup. We analyze their underlying mechanisms and introduce several enhancements, including automatic threshold selection to detect problematic images without manual tuning. Experimental results demonstrate that not all quality issues exert the same level of impact. While convolutional neural networks show resilience to certain distortions, they are particularly vulnerable to degradations that obscure critical visual features, such as blurring and severe downscaling. To assess the performance of existing tools and the effectiveness of our proposed enhancements, we formulate the detection of low-quality images as a binary classification task and use the F1 score as the evaluation metric. Our automatic thresholding method improves the F1 score from 0.6794 to 0.9468 under single perturbations and from 0.7447 to 0.8557 under dual perturbations. For near-duplicate detection, our deduplication strategy increases the F1 score from 0.4576 to 0.7928. These results underscore the effectiveness of our workflow and provide a foundation for advancing data quality assessment in image-based machine learning.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.2442

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

b4edda67f0f57e218a8e766927e3e5c5-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 22:40:42 GMT

grayscale, main paper, non-local group, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images

Burgert, Tom, Demir, Begüm

arXiv.org Artificial IntelligenceMay-24-2024

The application of data augmentation for deep learning (DL) methods plays an important role in achieving state-of-the-art results in supervised, semi-supervised, and self-supervised image classification. In particular, channel transformations (e.g., solarize, grayscale, brightness adjustments) are integrated into data augmentation pipelines for remote sensing (RS) image classification tasks. However, contradicting beliefs exist about their proper applications to RS images. A common point of critique is that the application of channel augmentation techniques may lead to physically inconsistent spectral data (i.e., pixel signatures). To shed light on the open debate, we propose an approach to estimate whether a channel augmentation technique affects the physical information of RS images. To this end, the proposed approach estimates a score that measures the alignment of a pixel signature within a time series that can be naturally subject to deviations caused by factors such as acquisition conditions or phenological states of vegetation. We compare the scores associated with original and augmented pixel signatures to evaluate the physical consistency. Experimental results on a multi-label image classification task show that channel augmentations yielding a score that exceeds the expected deviation of original pixel signatures can not improve the performance of a baseline model trained without augmentation.

augmentation technique, deviation, signature, (12 more...)

arXiv.org Artificial Intelligence

2403.14547

Country:

Europe > Lithuania (0.04)
Europe > Ireland (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Images that Sound: Composing Images and Sounds on a Single Canvas

Chen, Ziyang, Geng, Daniel, Owens, Andrew

arXiv.org Artificial IntelligenceMay-20-2024

Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/

machine learning, natural language, spectrogram, (15 more...)

arXiv.org Artificial Intelligence

2405.12221

Country:

North America > United States > Michigan (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

DUET: 2D Structured and Approximately Equivariant Representations

Suau, Xavier, Danieli, Federico, Keller, T. Anderson, Blaas, Arno, Huang, Chen, Ramapuram, Jason, Busbridge, Dan, Zappella, Luca

arXiv.org Artificial IntelligenceNov-17-2023

Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.

duet, representation, transformation, (16 more...)

arXiv.org Artificial Intelligence

2306.16058

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

TensorFlow for Computer Vision -- How to Train Image Classifier with Artificial Neural Networks

#artificialintelligenceNov-9-2021, 12:15:09 GMT

It should be big enough to train a decent image classifier, but not with ANNs. The only problem is -- it's not structured properly for deep learning out of the box. Let's get the library imports out of the way. We'll need quite a few of them, so make sure to have Numpy, Pandas, TensorFlow, PIL, and Scikit-Learn installed: You can't pass an image directly to a Dense layer. A single image is 3-dimensional -- height, width, color channels -- and a Dense layer expects a 1-dimensional input.

artificial neural network, tensorflow, train image classifier, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How to Change Background of an image using PixelLib

#artificialintelligenceAug-15-2021, 23:52:53 GMT

And we make use of deeplabv3 model trained on pascalvoc dataset. The model supports 20 common object categories, which means you can change the background of these objects in images.

background, change background, pixellib, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Raven's Progressive Matrices Completion with Latent Gaussian Process Priors

Shi, Fan, Li, Bin, Xue, Xiangyang

arXiv.org Artificial IntelligenceMar-22-2021

Abstract reasoning ability is fundamental to human intelligence. It enables humans to uncover relations among abstract concepts and further deduce implicit rules from the relations. As a well-known abstract visual reasoning task, Raven's Progressive Matrices (RPM) are widely used in human IQ tests. Although extensive research has been conducted on RPM solvers with machine intelligence, few studies have considered further advancing the standard answer-selection (classification) problem to a more challenging answer-painting (generating) problem, which can verify whether the model has indeed understood the implicit rules. In this paper we aim to solve the latter one by proposing a deep latent variable model, in which multiple Gaussian processes are employed as priors of latent variables to separately learn underlying abstract concepts from RPMs; thus the proposed model is interpretable in terms of concept-specific latent variables. The latent Gaussian process also provides an effective way of extrapolation for answer painting based on the learned concept-changing rules. We evaluate the proposed model on RPM-like datasets with multiple continuously-changing visual concepts. Experimental results demonstrate that our model requires only few training samples to paint high-quality answers, generate novel RPM panels, and achieve interpretability through concept-specific latent variables.

dataset, latent variable, position polygon, (16 more...)

arXiv.org Artificial Intelligence

2103.12045

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Coding Blackness: A History of Black Video Game Characters

WIREDFeb-26-2021, 13:00:00 GMT

Black history permeates all facets of our lives--and video games are no exception. From the 8-bit days to the 4k Ray Tracing present, Black video game characters have occupied various positions; from the precarious period of early sports games in the '70s, which included titles like Heavyweight Champ and the nameless grayscale sprites, to Spider-Man: Miles Morales as the poster child for a new gaming generation today, Black representation has come a long way. Similar to other mediums, such as film, music, and literature; Black culture has been, and is, integral to grappling with our collective understanding of video game history. People of color have often been portrayed in popular media as stereotypes and tropes that speak to an underlying structure of racism, patriarchy, heteronormativity, and other forms of systemic oppression. As a Black queer gaymer, the only time I ever saw myself on the screen was through character creation, but that's just cheating in the context of this story.

black video game character, coding blackness, history, (5 more...)

WIRED

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

Introduction to computer vision with openCV [READ]

#artificialintelligenceMar-27-2020, 20:29:53 GMT

For a very long time, computer scientists and engineers have been working to make computers perform tasks achievable by humans. Close to achieving this is artificial intelligence. Amongst these, computer vision is one of the most advanced and has had a greater impact for good. So what exactly is computer vision?. Computer vision is simply a branch of computer science that deals with making computers see or perceive the world the way the human eye does.

artificial intelligence, computer vision, pip install, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback