AITopics | visual data

Collaborating Authors

visual data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ambiguous Images With Human Judgments for Robust Visual Event Classification

Neural Information Processing SystemsDec-23-2025, 19:08:17 GMT

Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E (Squidy), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Toddler-Inspired Visual Object Learning

Sven Bambach, David Crandall, Linda Smith, Chen Yu

Neural Information Processing SystemsNov-20-2025, 16:08:19 GMT

The visual properties of images from the first-person and third-person views are very different [34].

artificial intelligence, machine learning, training data, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > Indiana (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Textual Training for the Hassle-Free Removal of Unwanted Visual Data: Case Studies on OOD and Hateful Image Detection Saehyung Lee

Neural Information Processing SystemsNov-20-2025, 05:42:39 GMT

Furthermore, HFTT employs a clever textual data synthesis method, effectively emulating the integration of unknown visual data distribution into the training process at no extra cost.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(4 more...)

Add feedback

98d8a23fd60826a2a474c5b4f5811707-AuthorFeedback.pdf

Neural Information Processing SystemsNov-17-2025, 22:14:06 GMT

artificial intelligence, metaconcept, natural language, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.49)

Add feedback

Toddler-Inspired Visual Object Learning

Sven Bambach, David Crandall, Linda Smith, Chen Yu

Neural Information Processing SystemsNov-17-2025, 11:53:32 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, training data, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > Indiana (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Textual Training for the Hassle-Free Removal of Unwanted Visual Data: Case Studies on OOD and Hateful Image Detection Saehyung Lee

Neural Information Processing SystemsOct-10-2025, 19:29:03 GMT

Furthermore, HFTT employs a clever textual data synthesis method, effectively emulating the integration of unknown visual data distribution into the training process at no extra cost.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(4 more...)

Add feedback

98d8a23fd60826a2a474c5b4f5811707-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 06:57:31 GMT

artificial intelligence, metaconcept, natural language, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.49)

Add feedback

RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System

Rezaei, Abdolazim, Sookhak, Mehdi, Haghparast, Mahboobeh

arXiv.org Artificial IntelligenceAug-18-2025

RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System 1 st Abdolazim Rezaei Department of Computer Science T exas A&M University Corpus Christi, USA 2 nd Mehdi Sookhak Department of Computer Science T exas A&M University Corpus Christi, USA 3 rd Mahboobeh Haghparast Department of Computer Science T exas A&M University Corpus Christi, USA Abstract --The proliferation of AI-powered cameras in Intelligent Transportation Systems (ITS) creates a severe conflict between the need for rich visual data and the right to privacy. Existing privacy-preserving methods, such as blurring or encryption, are often insufficient due to creating an undesirable trade-off where either privacy is compromised against advanced reconstruction attacks or data utility is critically degraded. T o resolve this challenge, we propose RL-MoE, a novel framework that transforms sensitive visual data into privacy-preserving textual descriptions, eliminating the need for direct image transmission. RL-MoE uniquely combines a Mixture-of-Experts (MoE) architecture for nuanced, multi-aspect scene decomposition with a Reinforcement Learning (RL) agent that optimizes the generated text for a dual objective of semantic accuracy and privacy preservation. Extensive experiments demonstrate that RL-MoE provides superior privacy protection, reducing the success rate of replay attacks to just 9.4% on the CFP-FP dataset, while simultaneously generating richer textual content than baseline methods. Our work provides a practical and scalable solution for building trustworthy AI systems in privacy-sensitive domains, paving the way for more secure smart city and autonomous vehicle networks. I NTRODUCTION The growing integration of artificial intelligence (AI) and Internet of Things (IoT) technologies in intelligent transportation systems (ITS) has significantly enhanced the capabilities of urban mobility management. From traffic monitoring and congestion analysis to automated violation detection and smart infrastructure planning, ITS plays a pivotal role in shaping the future of transportation. A key component of these systems is the use of roadside cameras, which continuously capture visual data to enable real-time decision-making and improve road safety.

information, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.09186

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > Maryland (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
(4 more...)

Add feedback

Unified Multimodal Understanding via Byte-Pair Visual Encoding

Zhang, Wanpeng, Feng, Yicheng, Luo, Hao, Li, Yijiang, Yue, Zihao, Zheng, Sipeng, Lu, Zongqing

arXiv.org Artificial IntelligenceJul-1-2025

Multimodal large language models (MLLMs) have made significant progress in vision-language understanding, yet effectively aligning different modalities remains a fundamental challenge. We present a framework that unifies multimodal understanding by applying byte-pair encoding to visual tokens. Unlike conventional approaches that rely on modality-specific encoders, our method directly incorporates structural information into visual tokens, mirroring successful tokenization strategies in text-only language models. We introduce a priority-guided encoding scheme that considers both frequency and spatial consistency, coupled with a multi-stage training procedure based on curriculum-driven data composition. These enhancements enable the transformer model to better capture cross-modal relationships and reason with visual information. Comprehensive experiments demonstrate improved performance across diverse vision-language tasks. By bridging the gap between visual and textual representations, our approach contributes to the advancement of more capable and efficient multimodal foundation models.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.23639

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models

Khan, Michel Gokan, Guarese, Renan, Johnson, Fabian, Wang, Xi Vincent, Bergman, Anders, Edvinsson, Benjamin, Romero, Mario, Vachier, Jérémy, Kronqvist, Jan

arXiv.org Artificial IntelligenceMay-20-2025

We introduce PerfCam, an open source Proof-of-Concept (PoC) digital twinning framework that combines camera and sensory data with 3D Gaussian Splatting and computer vision models for digital twinning, object tracking, and Key Performance Indicators (KPIs) extraction in industrial production lines. By utilizing 3D reconstruction and Convolutional Neural Networks (CNNs), PerfCam offers a semi-automated approach to object tracking and spatial mapping, enabling digital twins that capture real-time KPIs such as availability, performance, Overall Equipment Effectiveness (OEE), and rate of conveyor belts in the production line. We validate the effectiveness of PerfCam through a practical deployment within realistic test production lines in the pharmaceutical industry and contribute an openly published dataset to support further research and development in the field. The results demonstrate PerfCam's ability to deliver actionable insights through its precise digital twin capabilities, underscoring its value as an effective tool for developing usable digital twins in smart manufacturing environments and extracting operational analytics.

artificial intelligence, machine learning, production line, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2025.3567702

2504.18165

Country: