AITopics | Chu, Wei

Collaborating Authors

Chu, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

Jiang, Chen, Liu, Hong, Yu, Xuzheng, Wang, Qing, Cheng, Yuan, Xu, Jia, Liu, Zhongyi, Guo, Qingpei, Chu, Wei, Yang, Ming, Qi, Yuan

arXiv.org Artificial IntelligenceJan-26-2024

In recent years, the explosion of web videos makes text-video retrieval increasingly essential and popular for video filtering, recommendation, and search. Text-video retrieval aims to rank relevant text/video higher than irrelevant ones. The core of this task is to precisely measure the cross-modal similarity between texts and videos. Recently, contrastive learning methods have shown promising results for text-video retrieval, most of which focus on the construction of positive and negative pairs to learn text and video representations. Nevertheless, they do not pay enough attention to hard negative pairs and lack the ability to model different levels of semantic similarity. To address these two issues, this paper improves contrastive learning using two novel techniques. First, to exploit hard examples for robust discriminative power, we propose a novel Dual-Modal Attention-Enhanced Module (DMAE) to mine hard negative pairs from textual and visual clues. By further introducing a Negative-aware InfoNCE (NegNCE) loss, we are able to adaptively identify all these hard negatives and explicitly highlight their impacts in the training loss. Second, our work argues that triplet samples can better model fine-grained semantic similarity compared to pairwise samples. We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs. The proposed TPM-CL designs an adaptive token masking strategy with cross-modal interaction to model subtle semantic differences. Extensive experiments demonstrate that the proposed approach outperforms existing methods on four widely-used text-video retrieval datasets, including MSR-VTT, MSVD, DiDeMo and ActivityNet.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3581783.3612006

2309.11082

Country:

North America > Canada (0.16)
North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

Xu, Weidi, Wang, Jingwei, Xie, Lele, He, Jianshan, Zhou, Hongting, Wang, Taifeng, Wan, Xiaopei, Chen, Jingdong, Qu, Chao, Chu, Wei

arXiv.org Artificial IntelligenceSep-29-2023

Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints. This paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. By exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations effectively mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. Empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.

logic & formal reasoning, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.15458

Country:

Europe (1.00)
North America > Canada > Quebec (0.28)
North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input

Guo, Qingpei, Yao, Kaisheng, Chu, Wei

arXiv.org Artificial IntelligenceJun-25-2023

The ability to model intra-modal and inter-modal interactions is fundamental in multimodal machine learning. The current state-of-the-art models usually adopt deep learning models with fixed structures. They can achieve exceptional performances on specific tasks, but face a particularly challenging problem of modality mismatch because of diversity of input modalities and their fixed structures. In this paper, we present \textbf{Switch-BERT} for joint vision and language representation learning to address this problem. Switch-BERT extends BERT architecture by introducing learnable layer-wise and cross-layer interactions. It learns to optimize attention from a set of attention modes representing these interactions. One specific property of the model is that it learns to attend outputs from various depths, therefore mitigates the modality mismatch problem. We present extensive experiments on visual question answering, image-text retrieval and referring expression comprehension experiments. Results confirm that, whereas alternative architectures including ViLBERT and UNITER may excel in particular tasks, Switch-BERT can consistently achieve better or comparable performances than the current state-of-the-art models in these tasks. Ablation studies indicate that the proposed model achieves superior performances due to its ability in learning task-specific multimodal interactions.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.14182

Country: Europe > Italy (0.14)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition

Fan, Ruchao, Chu, Wei, Chang, Peng, Alwan, Abeer

arXiv.org Artificial IntelligenceApr-15-2023

Recently, end-to-end models have been widely used in automatic speech recognition (ASR) systems. Two of the most representative approaches are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. Autoregressive transformers, variants of AED, adopt an autoregressive mechanism for token generation and thus are relatively slow during inference. In this paper, we present a comprehensive study of a CTC Alignment-based Single-Step Non-Autoregressive Transformer (CASS-NAT) for end-to-end ASR. In CASS-NAT, word embeddings in the autoregressive transformer (AT) are substituted with token-level acoustic embeddings (TAE) that are extracted from encoder outputs with the acoustical boundary information offered by the CTC alignment. TAE can be obtained in parallel, resulting in a parallel generation of output tokens. During training, Viterbi-alignment is used for TAE generation, and multiple training strategies are further explored to improve the word error rate (WER) performance. During inference, an error-based alignment sampling method is investigated in depth to reduce the alignment mismatch in the training and testing processes. Experimental results show that the CASS-NAT has a WER that is close to AT on various ASR tasks, while providing a ~24x inference speedup. With and without self-supervised learning, we achieve new state-of-the-art results for non-autoregressive models on several datasets. We also analyze the behavior of the CASS-NAT decoder to explain why it can perform similarly to AT. We find that TAEs have similar functionality to word embeddings for grammatical structures, which might indicate the possibility of learning some semantic information from TAEs without a language model.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TASLP.2023.3263789

2304.07611

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DRGCN: Dynamic Evolving Initial Residual for Deep Graph Convolutional Networks

Zhang, Lei, Yan, Xiaodong, He, Jianshan, Li, Ruopeng, Chu, Wei

arXiv.org Artificial IntelligenceFeb-10-2023

Graph convolutional networks (GCNs) have been proved to be very practical to handle various graph-related tasks. It has attracted considerable research interest to study deep GCNs, due to their potential superior performance compared with shallow ones. However, simply increasing network depth will, on the contrary, hurt the performance due to the over-smoothing problem. Adding residual connection is proved to be effective for learning deep convolutional neural networks (deep CNNs), it is not trivial when applied to deep GCNs. Recent works proposed an initial residual mechanism that did alleviate the over-smoothing problem in deep GCNs. However, according to our study, their algorithms are quite sensitive to different datasets. In their setting, the personalization (dynamic) and correlation (evolving) of how residual applies are ignored. To this end, we propose a novel model called Dynamic evolving initial Residual Graph Convolutional Network (DRGCN). Firstly, we use a dynamic block for each node to adaptively fetch information from the initial representation. Secondly, we use an evolving block to model the residual evolving pattern between layers. Our experimental results show that our model effectively relieves the problem of over-smoothing in deep GCNs and outperforms the state-of-the-art (SOTA) methods on various benchmark datasets. Moreover, we develop a mini-batch version of DRGCN which can be applied to large-scale data. Coupling with several fair training techniques, our model reaches new SOTA results on the large-scale ogbn-arxiv dataset of Open Graph Benchmark (OGB). Our reproducible code is available on GitHub.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.05083

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Causal Effect Estimation: Recent Advances, Challenges, and Opportunities

Chu, Zhixuan, Huang, Jianmin, Li, Ruopeng, Chu, Wei, Li, Sheng

arXiv.org Artificial IntelligenceFeb-1-2023

Causal inference has numerous real-world applications in many domains, such as health care, marketing, political science, and online advertising. Treatment effect estimation, a fundamental problem in causal inference, has been extensively studied in statistics for decades. However, traditional treatment effect estimation methods may not well handle large-scale and high-dimensional heterogeneous data. In recent years, an emerging research direction has attracted increasing attention in the broad artificial intelligence field, which combines the advantages of traditional treatment effect estimation approaches (e.g., propensity score, matching, and reweighing) and advanced machine learning approaches (e.g., representation learning, adversarial learning, and graph neural networks). Although the advanced machine learning approaches have shown extraordinary performance in treatment effect estimation, it also comes with a lot of new topics and new research questions. In view of the latest research efforts in the causal inference field, we provide a comprehensive discussion of challenges and opportunities for the three core components of the treatment effect estimation task, i.e., treatment, covariates, and outcome. In addition, we showcase the promising research directions of this topic from multiple perspectives.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2302.00848

Country: North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Epidemiology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition

Fan, Ruchao, Chu, Wei, Chang, Peng, Xiao, Jing, Alwan, Abeer

arXiv.org Artificial IntelligenceJun-17-2021

In addition, Fujita et al. used the idea of the insertion Non-autoregressive mechanisms can significantly decrease inference transformer from NMT to generate the output sequence time for speech transformers, especially when the single with an arbitrary order [12]. Another recent effective method step variant is applied. Previous work on CTC alignmentbased is using multiple decoders as refiners to do an iterative refinement single step non-autoregressive transformer (CASS-NAT) based on CTC alignments [14]. Theoretically, the iterative has shown a large real time factor (RTF) improvement over autoregressive NAT has a limited improvement of inference speed since multiple transformers (AT). In this work, we propose several iterations are still needed to obtain a competitive result. In methods to improve the accuracy of the end-to-end CASS-contrast, single step NAT, which attempts to generate the output NAT, followed by performance analyses. First, convolution sequence with only one iteration, can have a better speed up augmented self-attention blocks are applied to both the encoder for inference. The idea is to substitute the word embedding in and decoder modules. Second, we propose to expand the trigger autoregressive models with an acoustic representation for each mask (acoustic boundary) for each token to increase the robustness output token, assuming that language semantics can also be captured of CTC alignments.

speech recognition, text processing, transformer, (20 more...)

arXiv.org Artificial Intelligence

2106.09885

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes

Huang, Hao, Wang, Yongtao, Chen, Zhaoyu, Li, Yuheng, Tang, Zhi, Chu, Wei, Chen, Jingdong, Lin, Weisi, Ma, Kai-Kuang

arXiv.org Artificial IntelligenceMay-23-2021

Malicious application of deepfakes (i.e., technologies can generate target faces or face attributes) has posed a huge threat to our society. The fake multimedia content generated by deepfake models can harm the reputation and even threaten the property of the person who has been impersonated. Fortunately, the adversarial watermark could be used for combating deepfake models, leading them to generate distorted images. The existing methods require an individual training process for every facial image, to generate the adversarial watermark against a specific deepfake model, which are extremely inefficient. To address this problem, we propose a universal adversarial attack method on deepfake models, to generate a Cross-Model Universal Adversarial Watermark (CMUA-Watermark) that can protect thousands of facial images from multiple deepfake models. Specifically, we first propose a cross-model universal attack pipeline by attacking multiple deepfake models and combining gradients from these models iteratively. Then we introduce a batch-based method to alleviate the conflict of adversarial watermarks generated by different facial images. Finally, we design a more reasonable and comprehensive evaluation method for evaluating the effectiveness of the adversarial watermark. Experimental results demonstrate that the proposed CMUA-Watermark can effectively distort the fake facial images generated by deepfake models and successfully protect facial images from deepfakes in real scenes.

deep learning, neural network, watermark, (17 more...)

arXiv.org Artificial Intelligence

2105.10872

Country:

Asia > China (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PairRE: Knowledge Graph Embeddings via Paired Relation Vectors

Chao, Linlin, He, Jianshan, Wang, Taifeng, Chu, Wei

arXiv.org Artificial IntelligenceNov-7-2020

Distance based knowledge graph embedding methods show promising results on link prediction task, on which two topics have been widely studied: one is the ability to handle complex relations, such as N-to-1, 1-to-N and N-to-N, the other is to encode various relation patterns, such as symmetry/antisymmetry. However, the existing methods fail to solve these two problems at the same time, which leads to unsatisfactory results. To mitigate this problem, we propose PairRE, a model with improved expressiveness and low computational requirement. PairRE represents each relation with paired vectors, where these paired vectors project connected two entities to relation specific locations. Beyond its ability to solve the aforementioned two problems, PairRE is advantageous to represent subrelation as it can capture both the similarities and differences of subrelations effectively. Given simple constraints on relation representations, PairRE can be the first model that is capable of encoding symmetry/antisymmetry, inverse, composition and subrelation relations. Experiments on link prediction benchmarks show PairRE can achieve either state-of-the-art or highly competitive performances. In addition, PairRE has shown encouraging results for encoding subrelation.

artificial intelligence, relation, text processing, (16 more...)

arXiv.org Artificial Intelligence

2011.03798

Country: North America > United States (0.30)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.75)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Question Directed Graph Attention Network for Numerical Reasoning over Text

Chen, Kunlong, Xu, Weidi, Cheng, Xingyi, Xiaochuan, Zou, Zhang, Yuyu, Song, Le, Wang, Taifeng, Qi, Yuan, Chu, Wei

arXiv.org Artificial IntelligenceSep-15-2020

Although NumNet achieves superior performance than Numerical reasoning over texts, such as addition, other numerically-aware models (Hu et al., 2019a; Andor subtraction, sorting and counting, is a et al., 2019; Geva et al., 2020; Chen et al., 2020), we challenging machine reading comprehension argue that NumNet is insufficient for sophisticated numerical task, since it requires both natural language understanding reasoning, since it lacks two critical ingredients and arithmetic computation. To for numerical reasoning: address this challenge, we propose a heterogeneous 1. Number Type and Entity Mention. The number graph representation for the context of comparison graph in NumNet is not able to identify the passage and question needed for such reasoning, different number types, and lacks the information of and design a question directed graph entities mentioned in the document that connect the attention network to drive multi-step numerical number nodes.

artificial intelligence, neural network, reasoning, (19 more...)

arXiv.org Artificial Intelligence

2009.07448

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback