AITopics | Cao, Meng

Collaborating Authors

Cao, Meng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Successor Features for Efficient Multisubject Controlled Text Generation

Cao, Meng, Fatemi, Mehdi, Cheung, Jackie Chi Kit, Shabanian, Samira

arXiv.org Artificial IntelligenceNov-2-2023

While large language models (LLMs) have achieved impressive performance in generating fluent and realistic text, controlling the generated text so that it exhibits properties such as safety, factuality, and non-toxicity remains challenging. % such as DExperts, GeDi, and rectification Existing decoding-based methods are static in terms of the dimension of control; if the target subject is changed, they require new training. Moreover, it can quickly become prohibitive to concurrently control multiple subjects. In this work, we introduce SF-GEN, which is grounded in two primary concepts: successor features (SFs) to decouple the LLM's dynamics from task-specific rewards, and language model rectification to proportionally adjust the probability of selecting a token based on the likelihood that the finished text becomes undesired. SF-GEN seamlessly integrates the two to enable dynamic steering of text generation with no need to alter the LLM's parameters. Thanks to the decoupling effect induced by successor features, our method proves to be memory-wise and computationally efficient for training as well as decoding, especially when dealing with multiple target subjects. To the best of our knowledge, our research represents the first application of successor features in text generation. In addition to its computational efficiency, the resultant language produced by our method is comparable to the SOTA (and outperforms baselines) in both control measures as well as language quality, which we demonstrate through a series of experiments in various controllable text generation tasks.

efficient multisubject controlled text generation, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2311.04921

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

Jiang, Ji, Cao, Meng, Song, Tengtao, Chen, Long, Wang, Yi, Zou, Yuexian

arXiv.org Artificial IntelligenceOct-25-2023

Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language. Recent improvements in video REC have been made using Transformer-based methods with learnable queries. However, we contend that this naive query design is not ideal given the open-world nature of video REC brought by text supervision. With numerous potential semantic categories, relying on only a few slow-updated queries is insufficient to characterize them. Our solution to this problem is to create dynamic queries that are conditioned on both the input video and language to model the diverse objects referred to. Specifically, we place a fixed number of learnable bounding boxes throughout the frame and use corresponding region features to provide prior information. Also, we noticed that current query features overlook the importance of cross-modal alignment. To address this, we align specific phrases in the sentence with semantically relevant visual areas, annotating them in existing video datasets (VID-Sentence and VidSTG). By incorporating these two designs, our proposed model (called ConFormer) outperforms other models on widely benchmarked datasets. For example, in the testing split of VID-Sentence dataset, ConFormer achieves 8.75% absolute improvement on Accu.@0.6 compared to the previous state-of-the-art model.

artificial intelligence, content-conditioned query, natural language, (3 more...)

arXiv.org Artificial Intelligence

2310.16402

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

Lai, Zhengfeng, Zhang, Haotian, Wu, Wentao, Bai, Haoping, Timofeev, Aleksei, Du, Xianzhi, Gan, Zhe, Shan, Jiulong, Chuah, Chen-Nee, Yang, Yinfei, Cao, Meng

arXiv.org Artificial IntelligenceOct-11-2023

Web-crawled datasets are pivotal to the success of pre-training vision-language models, exemplified by CLIP. However, web-crawled AltTexts can be noisy and potentially irrelevant to images, thereby undermining the crucial image-text alignment. Existing methods for rewriting captions using large language models (LLMs) have shown promise on small, curated datasets like CC3M and CC12M. Nevertheless, their efficacy on massive web-captured captions is constrained by the inherent noise and randomness in such data. In this study, we address this limitation by focusing on two key aspects: data quality and data variety. Unlike recent LLM rewriting techniques, we emphasize exploiting visual concepts and their integration into the captions to improve data quality. For data variety, we propose a novel mixed training scheme that optimally leverages AltTexts alongside newly generated Visual-enriched Captions (VeC). We use CLIP as one example and adapt the method for CLIP training on large-scale web-crawled datasets, named VeCLIP. We conduct a comprehensive evaluation of VeCLIP across small, medium, and large scales of raw data. Our results show significant advantages in image-text alignment and overall model performance, underscoring the effectiveness of VeCLIP in improving CLIP training. For example, VeCLIP achieves a remarkable over 20% improvement in COCO and Flickr30k retrieval tasks under the 12M setting. For data efficiency, we also achieve a notable over 3% improvement while using only 14% of the data employed in the vanilla CLIP and 11% in ALIGN.

large language model, natural language, visual-enriched caption, (4 more...)

arXiv.org Artificial Intelligence

2310.07699

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Lyu, Xiaozhong, Grafberger, Stefan, Biegel, Samantha, Wei, Shaopeng, Cao, Meng, Schelter, Sebastian, Zhang, Ce

arXiv.org Artificial IntelligenceJul-6-2023

Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).

machine learning, natural language, retrieval corpus, (17 more...)

arXiv.org Artificial Intelligence

2307.03027

Country: Europe (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON

Bai, Haoping, Mou, Shancong, Likhomanenko, Tatiana, Cinbis, Ramazan Gokberk, Tuzel, Oncel, Huang, Ping, Shan, Jiulong, Shi, Jianjun, Cao, Meng

arXiv.org Artificial IntelligenceJun-17-2023

Despite progress in vision-based inspection algorithms, real-world industrial challenges -- specifically in data availability, quality, and complex production requirements -- often remain under-addressed. We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges. Unlike previous datasets, VISION brings versatility to defect detection, offering annotation masks across all splits and catering to various detection methodologies. Our datasets also feature instance-segmentation annotation, enabling precise defect identification. With a total of 18k images encompassing 44 defect types, VISION strives to mirror a wide range of real-world production scenarios. By supporting two ongoing challenge competitions on the VISION Datasets, we hope to foster further advancements in vision-based industrial inspection.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.0789

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (0.70)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Tune-Mode ConvBN Blocks For Efficient Transfer Learning

You, Kaichao, Bao, Anchang, Qin, Guo, Cao, Meng, Huang, Ping, Shan, Jiulong, Long, Mingsheng

arXiv.org Artificial IntelligenceMay-19-2023

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and model validation, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To solve the dilemma, we theoretically reveal the reason behind the diminished training stability observed in the Deploy mode. Subsequently, we propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune mode is as stable as Eval mode for transfer learning, and its computational efficiency closely matches that of the Deploy mode. Through extensive experiments in both object detection and classification tasks, carried out across various datasets and model architectures, we demonstrate that the proposed Tune mode does not hurt the original performance while significantly reducing GPU memory footprint and training time, thereby contributing an efficient solution to transfer learning with convolutional networks.

artificial intelligence, eval mode, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.11624

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Systematic Rectification of Language Models via Dead-end Analysis

Cao, Meng, Fatemi, Mehdi, Cheung, Jackie Chi Kit, Shabanian, Samira

arXiv.org Artificial IntelligenceFeb-27-2023

With adversarial or otherwise normal prompts, existing large language models (LLM) can be pushed to generate toxic discourses. One way to reduce the risk of LLMs generating undesired discourses is to alter the training of the LLM. This can be very restrictive due to demanding computation requirements. Other methods rely on rule-based or prompt-based token elimination, which are limited as they dismiss future tokens and the overall meaning of the complete discourse. Here, we center detoxification on the probability that the finished discourse is ultimately considered toxic. That is, at each point, we advise against token selections proportional to how likely a finished text from this point will be toxic. To this end, we formally extend the dead-end theory from the recent reinforcement learning (RL) literature to also cover uncertain outcomes. Our approach, called rectification, utilizes a separate but significantly smaller model for detoxification, which can be applied to diverse LLMs as long as they share the same vocabulary. Importantly, our method does not require access to the internal representations of the LLM, but only the token probability distribution at each decoding step. This is crucial as many LLMs today are hosted in servers and only accessible through APIs. When applied to various LLMs, including GPT-3, our approach significantly improves the generated discourse compared to the base LLMs and other techniques in terms of both the overall language and detoxification performance.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.14003

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RGI: robust GAN-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection

Mou, Shancong, Gu, Xiaoyi, Cao, Meng, Bai, Haoping, Huang, Ping, Shan, Jiulong, Shi, Jianjun

arXiv.org Machine LearningFeb-24-2023

Generative adversarial networks (GANs), trained on a large-scale image dataset, can be a good approximator of the natural image manifold. GAN-inversion, using a pre-trained generator as a deep generative prior, is a promising tool for image restoration under corruptions. However, the performance of GAN-inversion can be limited by a lack of robustness to unknown gross corruptions, i.e., the restored image might easily deviate from the ground truth. In this paper, we propose a Robust GAN-inversion (RGI) method with a provable robustness guarantee to achieve image restoration under unknown \textit{gross} corruptions, where a small fraction of pixels are completely corrupted. Under mild assumptions, we show that the restored image and the identified corrupted region mask converge asymptotically to the ground truth. Moreover, we extend RGI to Relaxed-RGI (R-RGI) for generator fine-tuning to mitigate the gap between the GAN learned manifold and the true image manifold while avoiding trivial overfitting to the corrupted input image, which further improves the image restoration and corrupted region mask identification performance. The proposed RGI/R-RGI method unifies two important applications with state-of-the-art (SOTA) performance: (i) mask-free semantic inpainting, where the corruptions are unknown missing regions, the restored background can be used to restore the missing content; (ii) unsupervised pixel-wise anomaly detection, where the corruptions are unknown anomalous regions, the retrieved mask can be used as the anomalous region's segmentation mask.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2302.12464

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning with Rejection for Abstractive Text Summarization

Cao, Meng, Dong, Yue, He, Jingyi, Cheung, Jackie Chi Kit

arXiv.org Artificial IntelligenceFeb-16-2023

State-of-the-art abstractive summarization systems frequently hallucinate content that is not supported by the source document, mainly due to noise in the training dataset. Existing methods opt to drop the noisy samples or tokens from the training set entirely, reducing the effective training set size and creating an artificial propensity to copy words from the source. In this work, we propose a training objective for abstractive summarization based on rejection learning, in which the model learns whether or not to reject potentially noisy tokens. We further propose a regularized decoding objective that penalizes non-factual candidate summaries during inference by using the rejection probability learned during training. We show that our method considerably improves the factuality of generated summaries in automatic and human evaluations when compared to five baseline models and that it does so while increasing the abstractiveness of the generated summaries.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.08531

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Research Report (0.50)
Instructional Material (0.35)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Law > Criminal Law (0.94)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Adversarial Inter-Group Link Injection Degrades the Fairness of Graph Neural Networks

Hussain, Hussain, Cao, Meng, Sikdar, Sandipan, Helic, Denis, Lex, Elisabeth, Strohmaier, Markus, Kern, Roman

arXiv.org Artificial IntelligenceDec-16-2022

We present evidence for the existence and effectiveness of adversarial attacks on graph neural networks (GNNs) that aim to degrade fairness. These attacks can disadvantage a particular subgroup of nodes in GNN-based node classification, where nodes of the underlying network have sensitive attributes, such as race or gender. We conduct qualitative and experimental analyses explaining how adversarial link injection impairs the fairness of GNN predictions. For example, an attacker can compromise the fairness of GNN-based node classification by injecting adversarial links between nodes belonging to opposite subgroups and opposite class labels. Our experiments on empirical datasets demonstrate that adversarial fairness attacks can significantly degrade the fairness of GNN predictions (attacks are effective) with a low perturbation rate (attacks are efficient) and without a significant drop in accuracy (attacks are deceptive). This work demonstrates the vulnerability of GNN models to adversarial fairness attacks. We hope our findings raise awareness about this issue in our community and lay a foundation for the future development of GNN models that are more robust to such attacks.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICDM54844.2022.00117

2209.05957

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback