AITopics | image recognition task

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Neural Information Processing SystemsDec-25-2025, 12:46:42 GMT

Fast AutoAugment

Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment \cite{cubuk2018autoaugment} has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet.

autoaugment, fast autoaugment, name change, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

arXiv.org Artificial IntelligenceDec-8-2025

University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor system

Taniguchi, Takara, Ueda, Yudai, Muramatsu, Atsuya, Hashimoto, Kohki, Yagi, Ryo, Ochiai, Hideya, Aswakul, Chaodit

Many industrial sectors have been using of machine learning at inference mode on edge devices. Future directions show that training on edge devices is promising due to improvements in semiconductor performance. Wireless Ad Hoc Federated Learning (WAFL) has been proposed as a promising approach for collaborative learning with device-to-device communication among edges. In particular, WAFL with Vision Transformer (WAFL-ViT) has been tested on image recognition tasks with the UTokyo Building Recognition Dataset (UTBR). Since WAFL-ViT is a mission-oriented sensor system, it is essential to construct specific datasets by each mission. In our work, we have developed the Chulalongkorn University Building Recognition Dataset (CUBR), which is specialized for Chulalongkorn University as a case study in Thailand. Additionally, our results also demonstrate that training on WAFL scenarios achieves better accuracy than self-training scenarios. Dataset is available in https://github.com/jo2lxq/wafl/.

building recognition dataset, machine learning, pattern recognition, (13 more...)

2512.05468

Country:

Asia > Thailand (0.61)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.27)

Genre: Research Report > New Finding (0.89)

Industry:

Banking & Finance (0.69)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.39)

Neural Information Processing SystemsMay-27-2025, 19:14:24 GMT

UNIT: Unifying Image and Text Recognition in One Vision Encoder

Currently, vision encoder models like Vision Transformers (ViTs) typically excel at image recognition tasks but cannot simultaneously support text recognition like human visual recognition. To address this limitation, we propose UNIT, a novel training framework aimed at UNifying Image and Text recognition within a single model. Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities. The training process comprises two stages: intra-scale pretraining and inter-scale finetuning. During intra-scale pretraining, UNIT learns unified representations from multi-scale inputs, where images and documents are at their commonly used resolution, to enable fundamental recognition capability.

artificial intelligence, machine learning, pattern recognition, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.92)

Vashkevich, Maxim, Krivalcevich, Egor

Compact and Efficient Neural Networks for Image Recognition Based on Learned 2D Separable Transform

arXiv.org Artificial IntelligenceMay-13-2025

The paper presents a learned two-dimensional separable transform (LST) that can be considered as a new type of computational layer for constructing neural network (NN) architecture for image recognition tasks. The LST based on the idea of sharing the weights of one fullyconnected (FC) layer to process all rows of an image. After that, a second shared FC layer is used to process all columns of image representation obtained from the first layer. The use of LST layers in a NN architecture significantly reduces the number of model parameters compared to models that use stacked FC layers. We show that a NN-classifier based on a single LST layer followed by an FC layer achieves 98.02\% accuracy on the MNIST dataset, while having only 9.5k parameters. We also implemented a LST-based classifier for handwritten digit recognition on the FPGA platform to demonstrate the efficiency of the suggested approach for designing a compact and high-performance implementation of NN models. Git repository with supplementary materials: https://github.com/Mak-Sim/LST-2d

artificial intelligence, machine learning, pattern recognition, (17 more...)

doi: 10.1109/DSPA64310.2025.10977914

2505.06578

Country:

Europe > France (0.04)
Europe > Belarus > Minsk Region > Minsk (0.04)
Asia > Mongolia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.62)

arXiv.org Artificial IntelligenceFeb-14-2025

Janus: Collaborative Vision Transformer Under Dynamic Network Environment

Jiang, Linyi, Fu, Silvery D., Zhu, Yifei, Li, Bo

Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Network architectures and achieved state-of-the-art results in various computer vision tasks. Since ViTs are computationally expensive, the models either have to be pruned to run on resource-limited edge devices only or have to be executed on remote cloud servers after receiving the raw data transmitted over fluctuating networks. The resulting degraded performance or high latency all hinder their widespread applications. In this paper, we present Janus, the first framework for low-latency cloud-device collaborative Vision Transformer inference over dynamic networks. Janus overcomes the intrinsic model limitations of ViTs and realizes collaboratively executing ViT models on both cloud and edge devices, achieving low latency, high accuracy, and low communication overhead. Specifically, Janus judiciously combines token pruning techniques with a carefully designed fine-to-coarse model splitting policy and non-static mixed pruning policy. It attains a balance between accuracy and latency by dynamically selecting the optimal pruning level and split point. Experimental results across various tasks demonstrate that Janus enhances throughput by up to 5.15 times and reduces latency violation ratios by up to 98.7% when compared with baseline approaches under various network environments.

artificial intelligence, latency, machine learning, (17 more...)

2502.10047

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2024, 05:54:27 GMT

Fast AutoAugment

Data augmentation is an essential technique for improving generalization ability of deep learning models. Recently, AutoAugment \cite{cubuk2018autoaugment} has been proposed as an algorithm to automatically search for augmentation policies from a dataset and has significantly enhanced performances on many image recognition tasks. However, its search method requires thousands of GPU hours even for a relatively small dataset. In this paper, we propose an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching. In comparison to AutoAugment, the proposed algorithm speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets including CIFAR-10, CIFAR-100, SVHN, and ImageNet.

autoaugment, fast autoaugment, image recognition task, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsOct-7-2024, 11:26:29 GMT

Reviews: Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

The authors provide a clear and succinct introduction to the problems and approaches of biologically plausible forms of backprop in the brain. They argue for behavioural realism apart from physiological realism and undertake a detailed comparison of backprop versus difference target prop and its variants (some of which they newly propose) and also direct feedback alignment. In the end though, they find that all proposed forms of bio-plausible alternatives to backprop fall quite short on complex image recognition tasks. Despite the negative results, I find such a comparison very timely to consolidate results and push the community to search for better and more diverse alternatives. Overall I find the work impressive. The authors claim that weight sharing is not plausible in the brain.

biologically-motivated deep learning algorithm, deep learning algorithm and architecture, weight sharing, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Groot, Tobias, Valdenegro-Toro, Matias

Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

arXiv.org Artificial IntelligenceMay-5-2024

Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting. We propose the new Japanese Uncertain Scenes (JUS) dataset, aimed at testing VLM capabilities via difficult queries and object counting, and the Net Calibration Error (NCE) to measure direction of miscalibration. Results show that both LLMs and VLMs have a high calibration error and are overconfident most of the time, indicating a poor capability for uncertainty estimation. Additionally we develop prompts for regression tasks, and we show that VLMs have poor calibration when producing mean/standard deviation and 95% confidence intervals.

accuracy, correct answer, please rate, (16 more...)

2405.02917

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.30)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceJan-16-2023, 14:30:13 GMT

Deep Learning for Image Recognition: An Overview of Convolutional Neural Networks

Deep Learning is a subset of Machine Learning which has been proven to be very effective in solving complex problems such as image recognition, natural language processing, and speech recognition. In this article, we will focus on the application of deep learning in image recognition and specifically on Convolutional Neural Networks (CNNs) which are a type of deep learning algorithm that has been highly successful in image recognition tasks. A CNN is a type of neural network that is designed to work with image data. It is composed of multiple layers, with the first layer typically being a convolutional layer that is responsible for detecting low-level features in the image such as edges and textures. The next layers are pooling layers that are responsible for reducing the dimensionality of the feature maps and increasing the robustness of the network.

artificial intelligence, convolutional neural network, machine learning, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)