image stream
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Expressing an Image Stream with a Sequence of Natural Sentences
We propose an approach for generating a sequence of natural sentences for an image stream. Since general users usually take a series of pictures on their special moments, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole set to generate natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. To this end, we design a novel architecture called coherent recurrent convolutional network (CRCN), which consists of convolutional networks, bidirectional recurrent networks, and entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We demonstrate that our approach outperforms other state-of-the-art candidate methods, using both quantitative measures (e.g.
STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios
Xu, Dongyang, Luo, Yiran, Lu, Tianle, Wang, Qingfan, Zhou, Qing, Nie, Bingbing
Accurate behavior prediction for vehicles is essential but challenging for autonomous driving. Most existing studies show satisfying performance under regular scenarios, but most neglected safety-critical scenarios. In this study, a spatio-temporal dual-encoder network named STDA for safety-critical scenarios was developed. Considering the exceptional capabilities of human drivers in terms of situational awareness and comprehending risks, driver attention was incorporated into STDA to facilitate swift identification of the critical regions, which is expected to improve both performance and interpretability. STDA contains four parts: the driver attention prediction module, which predicts driver attention; the fusion module designed to fuse the features between driver attention and raw images; the temporary encoder module used to enhance the capability to interpret dynamic scenes; and the behavior prediction module to predict the behavior. The experiment data are used to train and validate the model. The results show that STDA improves the G-mean from 0.659 to 0.719 when incorporating driver attention and adopting a temporal encoder module. In addition, extensive experimentation has been conducted to validate that the proposed module exhibits robust generalization capabilities and can be seamlessly integrated into other mainstream models.
- Research Report > New Finding (0.86)
- Research Report > Experimental Study (0.66)
- Automobiles & Trucks (1.00)
- Transportation > Ground > Road (0.66)
Expressing an Image Stream with a Sequence of Natural Sentences
We propose an approach for retrieving a sequence of natural sentences for an image stream. Since general users often take a series of pictures on their special moments, it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sentences. To this end, we design a multimodal architecture called coherent recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional recurrent neural networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We demonstrate that our approach outperforms other state-of-the-art candidate methods, using both quantitative measures (e.g.
- North America > United States > New York (0.05)
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
Wang, Eileen, Han, Soyeon Caren, Poon, Josiah
Visual storytelling aims to automatically generate a coherent story based on a given image sequence. Unlike tasks like image captioning, visual stories should contain factual descriptions, worldviews, and human social commonsense to put disjointed elements together to form a coherent and engaging human-writeable story. However, most models mainly focus on applying factual information and using taxonomic/lexical external knowledge when attempting to create stories. This paper introduces SCO-VIST, a framework representing the image sequence as a graph with objects and relations that includes human action motivation and its social interaction commonsense knowledge. SCO-VIST then takes this graph representing plot points and creates bridges between plot points with semantic and occurrence-based edge weights. This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm. Our proposed framework produces stories superior across multiple metrics in terms of visual grounding, coherence, diversity, and humanness, per both automatic and human evaluations.
A Supervised Tensor Dimension Reduction-Based Prognostics Model for Applications with Incomplete Imaging Data
This paper proposes a supervised dimension reduction methodology for tensor data which has two advantages over most image-based prognostic models. First, the model does not require tensor data to be complete which expands its application to incomplete data. Second, it utilizes time-to-failure (TTF) to supervise the extraction of low-dimensional features which makes the extracted features more effective for the subsequent prognostic. Besides, an optimization algorithm is proposed for parameter estimation and closed-form solutions are derived under certain distributions.
- Africa > Senegal > Kolda Region > Kolda (0.04)
- North America > United States > North Carolina (0.04)
- North America > United States > New York (0.04)
Smart Pressure e-Mat for Human Sleeping Posture and Dynamic Activity Recognition
Yuan, Liangqi, Wei, Yuan, Li, Jia
With the emphasis on healthcare, early childhood education, and fitness, non-invasive measurement and recognition methods have received more attention. Pressure sensing has been extensively studied due to its advantages of simple structure, easy access, visualization application, and harmlessness. This paper introduces a smart pressure e-mat (SPeM) system based on a piezoresistive material Velostat for human monitoring applications, including sleeping postures, sports, and yoga recognition. After a subsystem scans e-mat readings and processes the signal, it generates a pressure image stream. Deep neural networks (DNNs) are used to fit and train the pressure image stream and recognize the corresponding human behavior. Four sleeping postures and five dynamic activities inspired by Nintendo Switch Ring Fit Adventure (RFA) are used as a preliminary validation of the proposed SPeM system. The SPeM system achieves high accuracies on both applications, which demonstrates the high accuracy and generalization ability of the models. Compared with other pressure sensor-based systems, SPeM possesses more flexible applications and commercial application prospects, with reliable, robust, and repeatable properties.
- North America > United States > Washington > Whatcom County > Bellingham (0.04)
- North America > United States > Michigan > Oakland County > Rochester (0.04)
- Asia > China (0.04)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Education > Educational Setting (0.66)
- Leisure & Entertainment > Games > Computer Games (0.48)
Learning When to Use Adaptive Adversarial Image Perturbations against Autonomous Vehicles
Yoon, Hyung-Jin, Jafarnejadsani, Hamidreza, Voulgaris, Petros
The deep neural network (DNN) models for object detection using camera images are widely adopted in autonomous vehicles. However, DNN models are shown to be susceptible to adversarial image perturbations. In the existing methods of generating the adversarial image perturbations, optimizations take each incoming image frame as the decision variable to generate an image perturbation. Therefore, given a new image, the typically computationally-expensive optimization needs to start over as there is no learning between the independent optimizations. Very few approaches have been developed for attacking online image streams while considering the underlying physical dynamics of autonomous vehicles, their mission, and the environment. We propose a multi-level stochastic optimization framework that monitors an attacker's capability of generating the adversarial perturbations. Based on this capability level, a binary decision attack/not attack is introduced to enhance the effectiveness of the attacker. We evaluate our proposed multi-level image attack framework using simulations for vision-guided autonomous vehicles and actual tests with a small indoor drone in an office environment. The results show our method's capability to generate the image attack in real-time while monitoring when the attacker is proficient given state estimates.
- North America > United States > Nevada > Washoe County > Reno (0.14)
- North America > United States > California (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
veriFIRE: Verifying an Industrial, Learning-Based Wildfire Detection System
Amir, Guy, Freund, Ziv, Katz, Guy, Mandelbaum, Elad, Refaeli, Idan
In this short paper, we present our ongoing work on the veriFIRE project -- a collaboration between industry and academia, aimed at using verification for increasing the reliability of a real-world, safety-critical system. The system we target is an airborne platform for wildfire detection, which incorporates two deep neural networks. We describe the system and its properties of interest, and discuss our attempts to verify the system's consistency, i.e., its ability to continue and correctly classify a given input, even if the wildfire it describes increases in intensity. We regard this work as a step towards the incorporation of academic-oriented verification tools into real-world systems of interest.
- North America > United States (0.14)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Aerospace & Defense > Aircraft (0.48)
- Information Technology > Robotics & Automation (0.47)
- Information Technology > Security & Privacy (0.46)
Ton Peijnenburg, Fellow HTSC and Deputy Director VDL-ETG
AI, deep learning and algorithms such as Convolutional Neural Networks are outperforming other techniques in object detection in images or image streams. When we look at the detection of soccer balls in our soccer robots2, traditional machine vision techniques (including color segmentation and contour detection) are expensive in terms of computing power and not very robust to changes like different illumination. The Robocup initiative is struggling to move its soccer games outdoors, one of the reasons being poor sensing performance in outdoor daylight conditions. Driver assist systems for cars deal with outdoor conditions much better. When we use an off-the-shelf, less traditional neural network like YOLO3 in our lab, we can reliably and robustly detect all the balls in our field independently of their color, paint pattern, distance and illumination.