AITopics | He, Xiaodong

Collaborating Authors

He, Xiaodong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation

Luo, Huaishao, Bao, Junwei, Wu, Youzheng, He, Xiaodong, Li, Tianrui

arXiv.org Artificial IntelligenceJun-20-2023

Recently, the contrastive language-image pre-training, e.g., CLIP, has demonstrated promising results on various downstream tasks. The pre-trained model can capture enriched visual concepts for images by learning from a large scale of text-image data. However, transferring the learned visual knowledge to open-vocabulary semantic segmentation is still under-explored. In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner. The SegCLIP achieves segmentation based on ViT and the main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs. The gathering operation can dynamically capture the semantic groups, which can be used to generate the final segmentation results. We further propose a reconstruction loss on masked patches and a superpixel-based KL loss with pseudo-labels to enhance the visual representation. Experimental results show that our model achieves comparable or superior segmentation accuracy on the PASCAL VOC 2012 (+0.3% mIoU), PASCAL Context (+2.3% mIoU), and COCO (+2.2% mIoU) compared with baselines. We release the code at https://github.com/ArrowLuo/SegCLIP.

machine learning, natural language, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2211.14813

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

MoNET: Tackle State Momentum via Noise-Enhanced Training for Dialogue State Tracking

Zhang, Haoning, Bao, Junwei, Sun, Haipeng, Wu, Youzheng, Li, Wenye, Cui, Shuguang, He, Xiaodong

arXiv.org Artificial IntelligenceJun-18-2023

Dialogue state tracking (DST) aims to convert the dialogue history into dialogue states which consist of slot-value pairs. As condensed structural information memorizing all history information, the dialogue state in the last turn is typically adopted as the input for predicting the current state by DST models. However, these models tend to keep the predicted slot values unchanged, which is defined as state momentum in this paper. Specifically, the models struggle to update slot values that need to be changed and correct wrongly predicted slot values in the last turn. To this end, we propose MoNET to tackle state momentum via noise-enhanced training. First, the previous state of each turn in the training data is noised via replacing some of its slot values. Then, the noised previous state is used as the input to learn to predict the current state, improving the model's ability to update and correct slot values. Furthermore, a contrastive context matching framework is designed to narrow the representation distance between a state and its corresponding noised variant, which reduces the impact of noised state and makes the model better understand the dialogue history. Experimental results on MultiWOZ datasets show that MoNET outperforms previous DST methods. Ablations and analysis verify the effectiveness of MoNET in alleviating state momentum and improving anti-noise ability.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.05503

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Instructional Material (0.70)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets

Lu, Yu, Bao, Junwei, Ma, Zichen, Han, Xiaoguang, Wu, Youzheng, Cui, Shuguang, He, Xiaodong

arXiv.org Artificial IntelligenceJun-16-2023

High-quality data is essential for conversational recommendation systems and serves as the cornerstone of the network architecture development and training strategy design. Existing works contribute heavy human efforts to manually labeling or designing and extending recommender dialogue templates. However, they suffer from (i) the limited number of human annotators results in that datasets can hardly capture rich and large-scale cases in the real world, (ii) the limited experience and knowledge of annotators account for the uninformative corpus and inappropriate recommendations. In this paper, we propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues through a data2text generation process, where unstructured recommendation conversations are generated from structured graphs based on user-item information from the real world. In doing so, we comprehensively exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets. Extensive experiments validate the benefit brought by the automatically synthesized data under low-resource scenarios and demonstrate the promising potential to facilitate the development of a more effective conversational recommendation system.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.09631

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Novel Vector-Field-Based Motion Planning Algorithm for 3D Nonholonomic Robots

He, Xiaodong, Yao, Weijia, Sun, Zhiyong, Li, Zhongkui

arXiv.org Artificial IntelligenceApr-8-2023

This paper focuses on the motion planning for mobile robots in 3D, which are modelled by 6-DOF rigid body systems with nonholonomic kinematics constraints. We not only specify the target position, but also bring in the requirement of the heading direction at the terminal time, which gives rise to a new and more challenging 3D motion planning problem. The proposed planning algorithm involves a novel velocity vector field (VF) over the workspace, and by following the VF, the robot can be navigated to the destination with the specified heading direction. In order to circumvent potential collisions with obstacles and other robots, a composite VF is designed by composing the navigation VF and an additional VF tangential to the boundary of the dangerous area. Moreover, we propose a priority-based algorithm to deal with the motion coupling issue among multiple robots. Finally, numerical simulations are conducted to verify the theoretical results.

artificial intelligence, nonholonomic robot, planning & scheduling, (17 more...)

arXiv.org Artificial Intelligence

2302.1111

Country:

Europe (0.67)
North America > United States > New Jersey > Mercer County > Princeton (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding

Jia, Meihuizi, Shen, Lei, Shen, Xin, Liao, Lejian, Chen, Meng, He, Xiaodong, Chen, Zhendong, Li, Jiaqi

arXiv.org Artificial IntelligenceNov-27-2022

Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named entities. However, they suffer from improper alignment between entity types and visual regions or error propagation in the two-stage manner, which finally imports irrelevant visual information into texts. In this paper, we propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based multimodal named entity recognition and query grounding. Specifically, with the assistance of queries, MNER-QG can provide prior knowledge of entity types and visual regions, and further enhance representations of both texts and images. To conduct the query grounding task, we provide manual annotations and weak supervisions that are obtained via training a highly flexible visual grounding model with transfer learning. We conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MNER-QG outperforms the current state-of-the-art models on the MNER task, and also improves the query grounding performance.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.14739

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SGG: Learning to Select, Guide, and Generate for Keyphrase Generation

Zhao, Jing, Bao, Junwei, Wang, Yifan, Wu, Youzheng, He, Xiaodong, Zhou, Bowen

arXiv.org Artificial IntelligenceMay-6-2021

Keyphrases, that concisely summarize the high-level topics discussed in a document, can be categorized into present keyphrase which explicitly appears in the source text, and absent keyphrase which does not match any contiguous subsequence but is highly semantically related to the source. Most existing keyphrase generation approaches synchronously generate present and absent keyphrases without explicitly distinguishing these two categories. In this paper, a Select-Guide-Generate (SGG) approach is proposed to deal with present and absent keyphrase generation separately with different mechanisms. Specifically, SGG is a hierarchical neural network which consists of a pointing-based selector at low layer concentrated on present keyphrase generation, a selection-guided generator at high layer dedicated to absent keyphrase generation, and a guider in the middle to transfer information from selector to generator. Experimental results on four keyphrase generation benchmarks demonstrate the effectiveness of our model, which significantly outperforms the strong baselines for both present and absent keyphrases generation. Furthermore, we extend SGG to a title generation task which indicates its extensibility in natural language generation tasks.

deep learning, keyphrase, neural network, (21 more...)

arXiv.org Artificial Intelligence

2105.02544

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations

Liu, Fenglin, Liu, Yuanxin, Ren, Xuancheng, He, Xiaodong, Sun, Xu

Neural Information Processing SystemsMar-18-2020, 23:17:31 GMT

In vision-and-language grounding problems, fine-grained representations of the image are considered to be of paramount importance. Most of the current systems incorporate visual features and textual concepts as a sketch of an image. However, plainly inferred representations are usually undesirable in that they are composed of separate components, the relations of which are elusive. In this work, we aim at representing an image with a set of integrated visual regions and corresponding textual concepts, reflecting certain semantics. To this end, we build the Mutual Iterative Attention (MIA) module, which integrates correlated visual features and textual concepts, respectively, by aligning the two modalities.

artificial intelligence, representation, textual concept, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Sensing and Signal Processing > Image Processing (0.45)

Add feedback

Relation Module for Non-answerable Prediction on Question Answering

Huang, Kevin, Tang, Yun, Huang, Jing, He, Xiaodong, Zhou, Bowen

arXiv.org Artificial IntelligenceOct-23-2019

Machine reading comprehension(MRC) has attracted significant amounts of research attention recently, due to an increase of challenging reading comprehension datasets. In this paper, we aim to improve a MRC model's ability to determine whether a question has an answer in a given context (e.g. the recently proposed SQuAD 2.0 task). Our solution is a relation module that is adaptable to any MRC model. The relation module consists of both semantic extraction and relational information. We first extract high level semantics as objects from both question and context with multi-head self-attentive pooling. These semantic objects are then passed to a relation network, which generates relationship scores for each object pair in a sentence. These scores are used to determine whether a question is non-answerable. We test the relation module on the SQuAD 2.0 dataset using both BiDAF and BERT models as baseline readers. We obtain 1.8% gain of F1 on top of the BiDAF reader, and 1.0% on top of the BERT base model. These results show the effectiveness of our relation module on MRC

artificial intelligence, object-oriented architecture, relation module, (19 more...)

arXiv.org Artificial Intelligence

1910.10843

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.42)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)

Add feedback

Multiple instance learning with graph neural networks

Tu, Ming, Huang, Jing, He, Xiaodong, Zhou, Bowen

arXiv.org Machine LearningJun-11-2019

Multiple instance learning (MIL) aims to learn the mapping between a bag of instances and the bag-level label. In this paper, we propose a new end-to-end graph neural network (GNN) based algorithm for MIL: we treat each bag as a graph and use GNN to learn the bag embedding, in order to explore the useful structural information among instances in bags. The final graph representation is fed into a classifier for label prediction. Our algorithm is the first attempt to use GNN for MIL. We empirically show that the proposed algorithm achieves the state of the art performance on several popular MIL data sets without losing model interpretability.

algorithm, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1906.04881

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination

Liu, Ruixue, Chen, Baoyang, Chen, Meng, Wu, Youzheng, Qiu, Zhijie, He, Xiaodong

arXiv.org Artificial IntelligenceMay-9-2019

We present a novel real-time, collaborative, and interactive AI painting system, Mappa Mundi, for artistic Mind Map creation. The system consists of a voice-based input interface, an automatic topic expansion module, and an image projection module. The key innovation is to inject Artificial Imagination into painting creation by considering lexical and phonological similarities of language, learning and inheriting artist's original painting style, and applying the principles of Dadaism and impossibility of improvisation. Our system indicates that AI and artist can collaborate seamlessly to create imaginative artistic painting and Mappa Mundi has been applied in art exhibition in UCCA, Beijing

artificial intelligence, mappa mundi, natural language, (15 more...)

arXiv.org Artificial Intelligence

1905.03638

Country: Asia > China > Beijing > Beijing (0.26)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback