AITopics | image stream

Collaborating Authors

image stream

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Expressing an Image Stream with a Sequence of Natural Sentences

Cesc C. Park, Gunhee Kim

Neural Information Processing SystemsOct-2-2025, 01:01:45 GMT

Neural Information Processing Systems http://nips.cc/

coherence model, image stream, sequence, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Expressing an Image Stream with a Sequence of Natural Sentences

Neural Information Processing SystemsAug-12-2025, 21:39:38 GMT

We propose an approach for generating a sequence of natural sentences for an image stream. Since general users usually take a series of pictures on their special moments, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole set to generate natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. To this end, we design a novel architecture called coherent recurrent convolutional network (CRCN), which consists of convolutional networks, bidirectional recurrent networks, and entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We demonstrate that our approach outperforms other state-of-the-art candidate methods, using both quantitative measures (e.g.

expressing, image stream, name change, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications > Social Media (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios

Xu, Dongyang, Luo, Yiran, Lu, Tianle, Wang, Qingfan, Zhou, Qing, Nie, Bingbing

arXiv.org Artificial IntelligenceAug-3-2024

Accurate behavior prediction for vehicles is essential but challenging for autonomous driving. Most existing studies show satisfying performance under regular scenarios, but most neglected safety-critical scenarios. In this study, a spatio-temporal dual-encoder network named STDA for safety-critical scenarios was developed. Considering the exceptional capabilities of human drivers in terms of situational awareness and comprehending risks, driver attention was incorporated into STDA to facilitate swift identification of the critical regions, which is expected to improve both performance and interpretability. STDA contains four parts: the driver attention prediction module, which predicts driver attention; the fusion module designed to fuse the features between driver attention and raw images; the temporary encoder module used to enhance the capability to interpret dynamic scenes; and the behavior prediction module to predict the behavior. The experiment data are used to train and validate the model. The results show that STDA improves the G-mean from 0.659 to 0.719 when incorporating driver attention and adopting a temporal encoder module. In addition, extensive experimentation has been conducted to validate that the proposed module exhibits robust generalization capabilities and can be seamlessly integrated into other mainstream models.

driver attention, module, prediction, (14 more...)

arXiv.org Artificial Intelligence

2408.01774

Country: Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.66)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
(2 more...)

Add feedback

Expressing an Image Stream with a Sequence of Natural Sentences

Neural Information Processing SystemsMar-12-2024, 21:14:06 GMT

We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their special moments, it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. To this end, we design a multimodal architecture called coherent recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional recurrent neural networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We demonstrate that our approach outperforms other state-of-the-art candidate methods, using both quantitative measures (e.g.

coherence model, image stream, sequence, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
Asia > South Korea > Seoul > Seoul (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling

Wang, Eileen, Han, Soyeon Caren, Poon, Josiah

arXiv.org Artificial IntelligenceJan-31-2024

Visual storytelling aims to automatically generate a coherent story based on a given image sequence. Unlike tasks like image captioning, visual stories should contain factual descriptions, worldviews, and human social commonsense to put disjointed elements together to form a coherent and engaging human-writeable story. However, most models mainly focus on applying factual information and using taxonomic/lexical external knowledge when attempting to create stories. This paper introduces SCO-VIST, a framework representing the image sequence as a graph with objects and relations that includes human action motivation and its social interaction commonsense knowledge. SCO-VIST then takes this graph representing plot points and creates bridges between plot points with semantic and occurrence-based edge weights. This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm. Our proposed framework produces stories superior across multiple metrics in terms of visual grounding, coherence, diversity, and humanness, per both automatic and human evaluations.

node, story graph, storyline, (15 more...)

arXiv.org Artificial Intelligence

2402.00319

Country: North America > United States (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)

Add feedback

A Supervised Tensor Dimension Reduction-Based Prognostics Model for Applications with Incomplete Imaging Data

Zhou, Chengyu, Fang, Xiaolei

arXiv.org Artificial IntelligenceJun-4-2023

This paper proposes a supervised dimension reduction methodology for tensor data which has two advantages over most image-based prognostic models. First, the model does not require tensor data to be complete which expands its application to incomplete data. Second, it utilizes time-to-failure (TTF) to supervise the extraction of low-dimensional features which makes the extracted features more effective for the subsequent prognostic. Besides, an optimization algorithm is proposed for parameter estimation and closed-form solutions are derived under certain distributions.

artificial intelligence, image stream, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2207.11353

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > North Carolina (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition

Yuan, Liangqi, Wei, Yuan, Li, Jia

arXiv.org Artificial IntelligenceMay-18-2023

With the emphasis on healthcare, early childhood education, and fitness, non-invasive measurement and recognition methods have received more attention. Pressure sensing has been extensively studied due to its advantages of simple structure, easy access, visualization application, and harmlessness. This paper introduces a smart pressure e-mat (SPeM) system based on a piezoresistive material Velostat for human monitoring applications, including sleeping postures, sports, and yoga recognition. After a subsystem scans e-mat readings and processes the signal, it generates a pressure image stream. Deep neural networks (DNNs) are used to fit and train the pressure image stream and recognize the corresponding human behavior. Four sleeping postures and five dynamic activities inspired by Nintendo Switch Ring Fit Adventure (RFA) are used as a preliminary validation of the proposed SPeM system. The SPeM system achieves high accuracies on both applications, which demonstrates the high accuracy and generalization ability of the models. Compared with other pressure sensor-based systems, SPeM possesses more flexible applications and commercial application prospects, with reliable, robust, and repeatable properties.

artificial intelligence, machine learning, recognition, (17 more...)

arXiv.org Artificial Intelligence

2305.11367

Country:

North America > United States > Washington > Whatcom County > Bellingham (0.04)
North America > United States > Michigan > Oakland County > Rochester (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting (0.66)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Learning When to Use Adaptive Adversarial Image Perturbations against Autonomous Vehicles

Yoon, Hyung-Jin, Jafarnejadsani, Hamidreza, Voulgaris, Petros

arXiv.org Artificial IntelligenceMar-15-2023

The deep neural network (DNN) models for object detection using camera images are widely adopted in autonomous vehicles. However, DNN models are shown to be susceptible to adversarial image perturbations. In the existing methods of generating the adversarial image perturbations, optimizations take each incoming image frame as the decision variable to generate an image perturbation. Therefore, given a new image, the typically computationally-expensive optimization needs to start over as there is no learning between the independent optimizations. Very few approaches have been developed for attacking online image streams while considering the underlying physical dynamics of autonomous vehicles, their mission, and the environment. We propose a multi-level stochastic optimization framework that monitors an attacker's capability of generating the adversarial perturbations. Based on this capability level, a binary decision attack/not attack is introduced to enhance the effectiveness of the attacker. We evaluate our proposed multi-level image attack framework using simulations for vision-guided autonomous vehicles and actual tests with a small indoor drone in an office environment. The results show our method's capability to generate the image attack in real-time while monitoring when the attacker is proficient given state estimates.

artificial intelligence, machine learning, perturbation, (19 more...)

arXiv.org Artificial Intelligence

2212.13667

Country:

North America > United States > Nevada > Washoe County > Reno (0.14)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

veriFIRE: Verifying an Industrial, Learning-Based Wildfire Detection System

Amir, Guy, Freund, Ziv, Katz, Guy, Mandelbaum, Elad, Refaeli, Idan

arXiv.org Artificial IntelligenceDec-6-2022

In this short paper, we present our ongoing work on the veriFIRE project -- a collaboration between industry and academia, aimed at using verification for increasing the reliability of a real-world, safety-critical system. The system we target is an airborne platform for wildfire detection, which incorporates two deep neural networks. We describe the system and its properties of interest, and discuss our attempts to verify the system's consistency, i.e., its ability to continue and correctly classify a given input, even if the wildfire it describes increases in intensity. We regard this work as a step towards the incorporation of academic-oriented verification tools into real-world systems of interest.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2212.03287

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.43)

Industry:

Aerospace & Defense > Aircraft (0.48)
Information Technology > Robotics & Automation (0.47)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ton Peijnenburg, Fellow HTSC and Deputy Director VDL-ETG

#artificialintelligenceOct-15-2020, 10:40:19 GMT

AI, deep learning and algorithms such as Convolutional Neural Networks are outperforming other techniques in object detection in images or image streams. When we look at the detection of soccer balls in our soccer robots2, traditional machine vision techniques (including color segmentation and contour detection) are expensive in terms of computing power and not very robust to changes like different illumination. The Robocup initiative is struggling to move its soccer games outdoors, one of the reasons being poor sensing performance in outdoor daylight conditions. Driver assist systems for cars deal with outdoor conditions much better. When we use an off-the-shelf, less traditional neural network like YOLO3 in our lab, we can reliably and robustly detect all the balls in our field independently of their color, paint pattern, distance and illumination.

artificial intelligence, htsc and deputy director vdl-etg, machine learning, (7 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback