two-stream network
Two-Stream Network for Sign Language Recognition and Translation
Sign languages are visual languages using manual articulations and non-manual elements to convey information. For sign language recognition and translation, the majority of existing approaches directly encode RGB videos into hidden representations. RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding. To mitigate this problem and better incorporate domain knowledge, such as handshape and body movement, we introduce a dual visual encoder containing two separate streams to model both the raw videos and the keypoint sequences generated by an off-the-shelf keypoint estimator. To make the two streams interact with each other, we explore a variety of techniques, including bidirectional lateral connection, sign pyramid network with auxiliary supervision, and frame-level self-distillation. The resulting model is called TwoStream-SLR, which is competent for sign language recognition (SLR). TwoStream-SLR is extended to a sign language translation (SLT) model, TwoStream-SLT, by simply attaching an extra translation network. Experimentally, our TwoStream-SLR and TwoStream-SLT achieve state-of-the-art performance on SLR and SLT tasks across a series of datasets including Phoenix-2014, Phoenix-2014T, and CSL-Daily.
Two-Stream Network for Sign Language Recognition and Translation
Sign languages are visual languages using manual articulations and non-manual elements to convey information. For sign language recognition and translation, the majority of existing approaches directly encode RGB videos into hidden representations. RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding. To mitigate this problem and better incorporate domain knowledge, such as handshape and body movement, we introduce a dual visual encoder containing two separate streams to model both the raw videos and the keypoint sequences generated by an off-the-shelf keypoint estimator. To make the two streams interact with each other, we explore a variety of techniques, including bidirectional lateral connection, sign pyramid network with auxiliary supervision, and frame-level self-distillation.
Two-Stream Networks for Lane-Change Prediction of Surrounding Vehicles
Fernández-Llorca, David, Biparva, Mahdi, Izquierdo-Gonzalo, Rubén, Tsotsos, John K.
Abstract-- In highway scenarios, an alert human driver will typically anticipate early cutin and cutout maneuvers of surrounding vehicles using only visual cues. Different sizes of the regions around the vehicles are analyzed, evaluating the importance of the interaction between vehicles and the context information in the performance. I. INTRODUCTION One of the closest and most plausible scenarios in the To deal with lane-change prediction of surrounding vehicles, adoption of the autonomous vehicles is autonomous navigation in this paper we pose the problem as an action at SAE L3 (chauffeur) or L4 (autopilot) on highways, recognition problem using visual information from cameras. The most advanced The idea behind our proposal is to use the same source of information automation systems to date are the Highway Chauffeur (visual cues) and the same type of approach (action (HC) and the Highway Autopilot (HA), which includes the recognition) that drivers use to anticipate these maneuvers. HC is mostly considered as L3 and HA as L4[1].
- Europe > Spain > Galicia > Madrid (0.04)
- Africa > Mauritania > Hodh El Gharbi > Aioun (0.04)
- North America > United States (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
- (2 more...)
Deep Stereo Matching With Explicit Cost Aggregation Sub-Architecture
Yu, Lidong (Beijing Institute of Technology) | Wang, Yucheng (Kandao Australia Research Center) | Wu, Yuwei (Beijing Institute of Technology) | Jia, Yunde (Beijing Institute of Technology)
Deep neural networks have shown excellent performance for stereo matching. Many efforts focus on the feature extraction and similarity measurement of the matching cost computation step while less attention is paid on cost aggregation which is crucial for stereo matching. In this paper, we present a learning-based cost aggregation method for stereo matching by a novel sub-architecture in the end-to-end trainable pipeline. We reformulate the cost aggregation as a learning process of the generation and selection of cost aggregation proposals which indicate the possible cost aggregation results. The cost aggregation sub-architecture is realized by a two-stream network: one for the generation of cost aggregation proposals, the other for the selection of the proposals. The criterion for the selection is determined by the low-level structure information obtained from a light convolutional network. The two-stream network offers a global view guidance for the cost aggregation to rectify the mismatching value stemming from the limited view of the matching cost computation. The comprehensive experiments on challenge datasets such as KITTI and Scene Flow show that our method outperforms the state-of-the-art methods.
- Asia > China > Beijing > Beijing (0.04)
- Oceania > Australia > New South Wales > North Ryde (0.04)