AITopics | Niu, Yulei

Collaborating Authors

Niu, Yulei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Weakly-Supervised Temporal Article Grounding

Chen, Long, Niu, Yulei, Chen, Brian, Lin, Xudong, Han, Guangxing, Thomas, Christopher, Ayyubi, Hammad, Ji, Heng, Chang, Shih-Fu

arXiv.org Artificial IntelligenceFeb-23-2023

Given a long untrimmed video and natural language queries, video grounding (VG) aims to temporally localize the semantically-aligned video segments. Almost all existing VG work holds two simple but unrealistic assumptions: 1) All query sentences can be grounded in the corresponding video. 2) All query sentences for the same video are always at the same semantic scale. Unfortunately, both assumptions make today's VG models fail to work in practice. For example, in real-world multimodal assets (eg, news articles), most of the sentences in the article can not be grounded in their affiliated videos, and they typically have rich hierarchical relations (ie, at different semantic scales). To this end, we propose a new challenging grounding task: Weakly-Supervised temporal Article Grounding (WSAG). Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales. Accordingly, we collect the first WSAG dataset to facilitate this task: YouwikiHow, which borrows the inherent multi-scale descriptions in wikiHow articles and plentiful YouTube videos. In addition, we propose a simple but effective method DualMIL for WSAG, which consists of a two-level MIL loss and a single-/cross- sentence constraint loss. These training objectives are carefully designed for these relaxed assumptions. Extensive ablations have verified the effectiveness of DualMIL.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.12444

Country: North America > United States (0.46)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Introspective Distillation for Robust Question Answering

Niu, Yulei, Zhang, Hanwang

arXiv.org Artificial IntelligenceNov-1-2021

Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension. Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance. Therefore, they are only applicable in domains where the test distribution is known in advance. In this paper, we present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA. Our key technical contribution is to blend the inductive bias of OOD and ID by introspecting whether a training sample fits in the factual ID world or the counterfactual OOD one. Experiments on visual QA datasets VQA v2, VQA-CP, and reading comprehension dataset SQuAD demonstrate that our proposed IntroD maintains the competitive OOD performance compared to other debiasing methods, while sacrificing little or even achieving better ID performance compared to the non-debiasing ones.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2111.01026

Genre: Research Report (0.64)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

Chen, Long, Zheng, Yuhang, Niu, Yulei, Zhang, Hanwang, Xiao, Jun

arXiv.org Artificial IntelligenceOct-3-2021

Today's VQA models still tend to capture superficial linguistic correlations in the training set and fail to generalize to the test set with different QA distributions. To reduce these language biases, recent VQA works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on diagnostic benchmarks for out-of-distribution testing. However, due to complex model design, these ensemble-based methods are unable to equip themselves with two indispensable characteristics of an ideal VQA model: 1) Visual-explainable: The model should rely on the right visual regions when making decisions. 2) Question-sensitive: The model should be sensitive to the linguistic variations in questions. To this end, we propose a novel model-agnostic Counterfactual Samples Synthesizing and Training (CSST) strategy. After training with CSST, VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST). CSS generates counterfactual samples by carefully masking critical objects in images or words in questions and assigning pseudo ground-truth answers. CST not only trains the VQA models with both complementary samples to predict respective ground-truth answers, but also urges the VQA models to further distinguish the original samples and superficially similar counterfactual ones. To facilitate the CST training, we propose two variants of supervised contrastive loss for VQA, and design an effective positive and negative sample selection mechanism based on CSS. Extensive experiments have shown the effectiveness of CSST. Particularly, by building on top of model LMH+SAR, we achieve record-breaking performance on all OOD benchmarks.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2110.01013

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.66)

Add feedback

FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation

Niu, Yulei (Renmin University of China) | Lu, Zhiwu (Renmin University of China) | Huang, Songfang (IBM China Research Lab) | Gao, Xin (King Abdullah University of Science and Technology) | Wen, Ji-Rong (Renmin University of China)

AAAI ConferencesFeb-14-2017

We propose a novel approach, called FeaBoost, to image semantic segmentation with only image-level labels taken as weakly-supervised constraints. Our approach is motivated from two evidences: 1) each superpixel can be represented as a linear combination of basic components (e.g., predefined classes); 2) visually similar superpixels have high probability to share the same set of labels, i.e., they tend to have common combination of predefined classes. By taking these two evidences into consideration, semantic segmentation is formulated as joint feature and label refinement over superpixels. Furthermore, we develop an efficient FeaBoost algorithm to solve such optimization problem. Extensive experiments on the MSRC and LabelMe datasets demonstrate the superior performance of our FeaBoost approach in comparison with the state-of-the-art methods, especially when noisy labels are provided for semantic segmentation.

artificial intelligence, optimization problem, semantic segmentation, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > China (0.16)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback