AITopics | Microsoft Research, Beijing

Collaborating Authors

Microsoft Research, Beijing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-View Grounding Given a Narrated 360° Video

Chou, Shih-Han (National Tsing Hua University) | Chen, Yi-Chun (National Tsing Hua University) | Zeng, Kuo-Hao (National Tsing Hua University) | Hu, Hou-Ning (National Tsing Hua University) | Fu, Jianlong (Microsoft Research, Beijing) | Sun, Min (National Tsing Hua University)

AAAI ConferencesFeb-8-2018

Narrated 360° videos are typically provided in many touring scenarios to mimic real-world experience. However, previous work has shown that smart assistance (i.e., providing visual guidance) can significantly help users to follow the Normal Field of View (NFoV) corresponding to the narrative.In this project, we aim at automatically grounding the NFoVs of a 360° video given subtitles of the narrative (referred to as ''NFoV-grounding"). We propose a novel Visual Grounding Model (VGM) to implicitly and efficiently predict the NFoVs given the video content and subtitles. Specifically, at each frame, we efficiently encode the panorama into feature map of candidate NFoVs using a Convolutional Neural Network (CNN) and the subtitles to the same hidden space using an RNN with Gated Recurrent Units (GRU). Then, we apply soft-attention on candidate NFoVs to trigger sentence decoder aiming to minimize the reconstruct loss between the generated and given sentence. Finally, we obtain the NFoV as the candidate NFoV with the maximum attention without any human supervision.To train VGM more robustly, we also generate a reverse sentence conditioning on one minus the soft-attention such that the attention focuses on candidate NFoVs less relevant to the given sentence. The negative log reconstruction loss of the reverse sentence (referred to as ''irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence. To evaluate our method, we collect the first narrated 360° videos dataset and achieve state-of-the-art NFoV-grounding performance.

deep learning, neural network, subtitle, (20 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

Asia (0.46)
North America > United States (0.28)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

How to Train a Compact Binary Neural Network with High Accuracy?

Tang, Wei (Institute of Automation, Chinese Academy of Sciences) | Hua, Gang (Microsoft Research, Beijing) | Wang, Liang ( Institute of Automation, Chinese Academy of Sciences )

AAAI ConferencesFeb-14-2017

How to train a binary neural network (BinaryNet) with both high compression rate and high accuracy on large scale dataset? We answer this question through a careful analysis of previous work on BinaryNets, in terms of training strategies, regularization, and activation approximation. Our findings first reveal that a low learning rate is highly preferred to avoid frequent sign changes of the weights, which often makes the learning of BinaryNets unstable. Secondly, we propose to use PReLU instead of ReLU in a BinaryNet to conveniently absorb the scale factor for weights to the activation function, which enjoys high computation efficiency for binarized layers while maintains high approximation accuracy. Thirdly, we reveal that instead of imposing L2 regularization, driving all weights to zero which contradicts with the setting of BinaryNets, we introduce a regularization term that encourages the weights to be bipolar. Fourthly, we discover that the failure of binarizing the last layer, which is essential for high compression rate, is due to the improper output range. We propose to use a scale layer to bring it to normal. Last but not least, we propose multiple binarizations to improve the approximation of the activations. The composition of all these enables us to train BinaryNets with both high compression rate and high accuracy, which is strongly supported by our extensive empirical study.

accuracy, artificial intelligence, neural network, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sketch Recognition with Natural Correction and Editing

Wu, Jie (Shanghai Jiao Tong University) | Wang, Changhu (Microsoft Research, Beijing) | Zhang, Liqing (Shanghai Jiao Tong University) | Rui, Yong (Microsoft Research, Beijing)

AAAI ConferencesJul-14-2014

In this paper, we target at the problem of sketch recognition. We systematically study how to incorporate users' correction and editing into isolated and full sketch recognition. This is a natural and necessary interaction in real systems such as Visio where very similar shapes exist. First, a novel algorithm is proposed to mine the prior shape knowledge for three editing modes. Second, to differentiate visually similar shapes, a novel symbol recognition algorithm is introduced by leveraging the learnt shape knowledge. Then, a novel editing detection algorithm is proposed to facilitate symbol recognition. Furthermore, both of the symbol recognizer and the editing detector are systematically incorporated into the full sketch recognition. Finally, based on the proposed algorithms, a real-time sketch recognition system is built to recognize hand-drawn flowcharts and diagrams with flexible interactions. Extensive experiments show the effectiveness of the proposed algorithms.

algorithm, artificial intelligence, sketch understanding, (17 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.28)
North America > United States (0.28)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)

Add feedback