AITopics

Building dialogue systems that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human utterances in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialogue. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose a model to jointly take into account message content and related commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts.

assertion, deep learning, neural network, (19 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.81)

Tracking Occluded Objects and Recovering Incomplete Trajectories by Reasoning About Containment Relations and Human Actions

Liang, Wei (Beijing Institute of Technology) | Zhu, Yixin (Center for Vision, Cognition, Learning, and Autonomy, University of California, Los Angeles) | Zhu, Song-Chun (Center for Vision, Cognition, Learning, and Autonomy, University of California, Los Angeles)

This paper studies a challenging problem of tracking severely occluded objects in long video sequences. The proposed method reasons about the containment relations and human actions, thus infers and recovers occluded objects identities while contained or blocked by others. There are two conditions that lead to incomplete trajectories: i) Contained. The occlusion is caused by a containment relation formed between two objects, e.g., an unobserved laptop inside a backpack forms containment relation between the laptop and the backpack. ii) Blocked. The occlusion is caused by other objects blocking the view from certain locations, during which the containment relation does not change. By explicitly distinguishing these two causes of occlusions, the proposed algorithm formulates tracking problem as a network flow representation encoding containment relations and their changes. By assuming all the occlusions are not spontaneously happened but only triggered by human actions, an MAP inference is applied to jointly interpret the trajectory of an object by detection in space and human actions in time. To quantitatively evaluate our algorithm, we collect a new occluded object dataset captured by Kinect sensor, including a set of RGB-D videos and human skeletons with multiple actors, various objects, and different changes of containment relations. In the experiments, we show that the proposed method demonstrates better performance on tracking occluded objects compared with baseline methods.

containment relation, neural network, spatial reasoning, (20 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.14)

Genre: Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.34)

Supervised Deep Hashing for Hierarchical Labeled Data

Wang, Dan (Beijing Institute of Technology) | Huang, Heyan (Beijing Institute of Technology) | Lu, Chi (Beijing Institute of Technology) | Feng, Bo-Si (Beijing Institute of Technology) | Wen, Guihua (South China University of Technology) | Nie, Liqiang (Shandong University) | Mao, Xian-Ling (Beijing Institute of Technology)

Recently, hashing methods have been widely used in large-scale image retrieval. However, most existing supervised hashing methods do not consider the hierarchical relation of labels,which means that they ignored the rich semantic information stored in the hierarchy. Moreover, most of previous works treat each bit in a hash code equally, which does not meet the scenario of hierarchical labeled data. To tackle the aforementioned problems, in this paper, we propose a novel deep hashing method, called supervised hierarchical deep hashing (SHDH), to perform hash code learning for hierarchical labeled data. Speciﬁcally, we deﬁne a novel similarity formula for hierarchical labeled data by weighting each level, and design a deep neural network to obtain a hash code for each data point. Extensive experiments on two real-world public datasets show that the proposed method outperforms the state-of-the-art baselines in the image retrieval task.

deep learning, labeled data, neural network, (20 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Europe (0.68)
North America > United States > Arizona (0.14)
North America > United States > Colorado (0.14)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Unsupervised Deep Learning of Mid-Level Video Representation for Action Recognition

Hou, Jingyi (Beijing Institute of Technology) | Wu, Xinxiao (Beijing Institute of Technology) | Chen, Jin (Beijing Institute of Technology ) | Luo, Jiebo (University of Rochester) | Jia, Yunde (Beijing Institute of Technology)

Current deep learning methods for action recognition rely heavily on large scale labeled video datasets. Manually annotating video datasets is laborious and may introduce unexpected bias to train complex deep models for learning video representation. In this paper, we propose an unsupervised deep learning method which employs unlabeled local spatial-temporal volumes extracted from action videos to learn midlevel video representation for action recognition. Specifically, our method simultaneously discovers mid-level semantic concepts by discriminative clustering and optimizes local spatial-temporal features by two relatively small and simple deep neural networks. The clustering generates semantic visual concepts that guide the training of the deep networks, and the networks in turn guarantee the robustness of the semantic concepts. Experiments on the HMDB51 and the UCF101 datasets demonstrate the superiority of the proposed method, even over several supervised learning methods.

deep learning, neural network, representation, (19 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.14)
North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Fang, Hao-Shu (Shanghai Jiao Tong University) | Xu, Yuanlu (University of California, Los Angeles) | Wang, Wenguan (Beijing Institute of Technology) | Liu, Xiaobai (San Diego State University) | Zhu, Song-Chun (University of California, Los Angeles)

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

deep learning, pose estimation, video understanding, (21 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States > California (0.28)
Asia (0.28)

Genre: Research Report (0.48)

Industry: Health & Medicine (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.74)

Semantic Structure-Based Word Embedding by Incorporating Concept Convergence and Word Divergence

Liu, Qian (Beijing Institute of Technology) | Huang, Heyan (Beijing Institute of Technology) | Zhang, Guangquan (University of Technology Sydney) | Gao, Yang (Beijing Institute of Technology) | Xuan, Junyu (University of Technology Sydney) | Lu, Jie (University of Technology Sydney)

Representing the semantics of words is a fundamental task in text processing. Several research studies have shown that text and knowledge bases (KBs) are complementary sources for word embedding learning. Most existing methods only consider relationships within word-pairs in the usage of KBs. We argue that the structural information of well-organized words within the KBs is able to convey more effective and stable knowledge in capturing semantics of words. In this paper, we propose a semantic structure-based word embedding method, and introduce concept convergence and word divergence to reveal semantic structures in the word embedding learning process. To assess the effectiveness of our method, we use WordNet for training and conduct extensive experiments on word similarity, word analogy, text classification and query expansion. The experimental results show that our method outperforms state-of-the-art methods, including the methods trained solely on the corpus, and others trained on the corpus and the KBs.

artificial intelligence, semantic structure, text processing, (17 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.15)
Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Examining CNN Representations With Respect to Dataset Bias

Zhang, Quanshi (University of California, Los Angeles) | Wang, Wenguan (Beijing Institute of Technology) | Zhu, Song-Chun (University of California, Los Angeles)

Given a pre-trained CNN without any testing samples, this paper proposes a simple yet effective method to diagnose feature representations of the CNN. We aim to discover representation flaws caused by potential dataset bias. More specifically, when the CNN is trained to estimate image attributes, we mine latent relationships between representations of different attributes inside the CNN. Then, we compare the mined attribute relationships with ground-truth attribute relationships to discover the CNN's blind spots and failure modes due to dataset bias. In fact, representation flaws caused by dataset bias cannot be examined by conventional evaluation strategies based on testing images, because testing images may also have a similar bias. Experiments have demonstrated the effectiveness of our method.

deep learning, neural network, representation, (20 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Deep Stereo Matching With Explicit Cost Aggregation Sub-Architecture

Yu, Lidong (Beijing Institute of Technology) | Wang, Yucheng (Kandao Australia Research Center) | Wu, Yuwei (Beijing Institute of Technology) | Jia, Yunde (Beijing Institute of Technology)

Deep neural networks have shown excellent performance for stereo matching. Many efforts focus on the feature extraction and similarity measurement of the matching cost computation step while less attention is paid on cost aggregation which is crucial for stereo matching. In this paper, we present a learning-based cost aggregation method for stereo matching by a novel sub-architecture in the end-to-end trainable pipeline. We reformulate the cost aggregation as a learning process of the generation and selection of cost aggregation proposals which indicate the possible cost aggregation results. The cost aggregation sub-architecture is realized by a two-stream network: one for the generation of cost aggregation proposals, the other for the selection of the proposals. The criterion for the selection is determined by the low-level structure information obtained from a light convolutional network. The two-stream network offers a global view guidance for the cost aggregation to rectify the mismatching value stemming from the limited view of the matching cost computation. The comprehensive experiments on challenge datasets such as KITTI and Scene Flow show that our method outperforms the state-of-the-art methods.

cost aggregation, deep learning, neural network, (18 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

AAAI ConferencesFeb-14-2017

Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition

Dong, Zhen (Beijing Institute of Technology) | Jia, Su (State University of New York at Stony Brook) | Zhang, Chi (Beijing Institute of Technology) | Pei, Mingtao (Beijing Institute of Technology) | Wu, Yuwei (Beijing Institute of Technology)

In this paper, we aim to construct a deep neural network which embeds high dimensional symmetric positive definite (SPD) matrices into a more discriminative low dimensional SPD manifold. To this end, we develop two types of basic layers: a 2D fully connected layer which reduces the dimensionality of the SPD matrices, and a symmetrically clean layer which achieves non-linear mapping. Specifically, we extend the classical fully connected layer such that it is suitable for SPD matrices, and we further show that SPD matrices with symmetric pair elements setting zero operations are still symmetric positive definite. Finally, we complete the construction of the deep neural network for SPD manifold learning by stacking the two layers. Experiments on several face datasets demonstrate the effectiveness of the proposed method.

deep learning, matrix, neural network, (19 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

AAAI ConferencesFeb-14-2017

Efficient Object Instance Search Using Fuzzy Objects Matching

Yu, Tan (Nanyang Technological University) | Wu, Yuwei (Beijing Institute of Technology) | Bhattacharjee, Sreyasee (Nanyang Technological University) | Yuan, Junsong (Nanyang Technological University)

Recently, global features aggregated from local convolutional features of the convolutional neural network have shown to be much more effective in comparison with hand-crafted features for image retrieval. However, the global feature might not effectively capture the relevance between the query object and reference images in the object instance search task, especially when the query object is relatively small and there exist multiple types of objects in reference images. Moreover, the object instance search requires to localize the object in the reference image, which may not be achieved through global representations. In this paper, we propose a Fuzzy Objects Matching (FOM) framework to effectively and efficiently capture the relevance between the query object and reference images in the dataset. In the proposed FOM scheme, object proposals are utilized to detect the potential regions of the query object in reference images. To achieve high search efficiency, we factorize the feature matrix of all the object proposals from one reference image into the product of a set of fuzzy objects and sparse codes. In addition, we refine the feature of the generated fuzzy objects according to its neighborhood in the feature space to generate more robust representation. The experimental results demonstrate that the proposed FOM framework significantly outperforms the state-of-the-art methods in precision with less memory and computational cost on three public datasets.

artificial intelligence, neural network, proposal, (20 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > Singapore (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)