AITopics | Liu, Zhenguang

Collaborating Authors

Liu, Zhenguang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MTVHunter: Smart Contracts Vulnerability Detection Based on Multi-Teacher Knowledge Translation

Sun, Guokai, Zhuang, Yuan, Zhang, Shuo, Feng, Xiaoyu, Liu, Zhenguang, Zhang, Liguo

arXiv.org Artificial IntelligenceFeb-24-2025

Smart contracts, closely intertwined with cryptocurrency transactions, have sparked widespread concerns about considerable financial losses of security issues. To counteract this, a variety of tools have been developed to identify vulnerability in smart contract. However, they fail to overcome two challenges at the same time when faced with smart contract bytecode: (i) strong interference caused by enormous non-relevant instructions; (ii) missing semantics of bytecode due to incomplete data and control flow dependencies. In this paper, we propose a multi-teacher based bytecode vulnerability detection method, namely Multi-Teacher Vulnerability Hunter (MTVHunter), which delivers effective denoising and missing semantic to bytecode under multi-teacher guidance. Specifically, we first propose an instruction denoising teacher to eliminate noise interference by abstract vulnerability pattern and further reflect in contract embeddings. Secondly, we design a novel semantic complementary teacher with neuron distillation, which effectively extracts necessary semantic from source code to replenish the bytecode. Particularly, the proposed neuron distillation accelerate this semantic filling by turning the knowledge transition into a regression task. We conduct experiments on 229,178 real-world smart contracts that concerns four types of common vulnerabilities. Extensive experiments show MTVHunter achieves significantly performance gains over state-of-the-art approaches.

machine learning, natural language, vulnerability, (20 more...)

arXiv.org Artificial Intelligence

2502.16955

Country: Asia > China > Heilongjiang Province (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Banking & Finance > Economy (1.00)
Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Security & Privacy (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Wang, Gaojian, Lin, Feng, Wu, Tong, Liu, Zhenguang, Ba, Zhongjie, Ren, Kui

arXiv.org Artificial IntelligenceDec-28-2024

This work asks: with abundant, unlabeled real faces, how to learn a robust and transferable facial representation that boosts various face security tasks with respect to generalization performance? We make the first attempt and propose a self-supervised pretraining framework to learn fundamental representations of real face images, FSFM, that leverages the synergy between masked image modeling (MIM) and instance discrimination (ID). We explore various facial masking strategies for MIM and present a simple yet powerful CRFR-P masking, which explicitly forces the model to capture meaningful intra-region consistency and challenging inter-region coherency. Furthermore, we devise the ID network that naturally couples with MIM to establish underlying local-to-global correspondence via tailored self-distillation. These three learning objectives, namely 3C, empower encoding both local features and global semantics of real faces. After pretraining, a vanilla ViT serves as a universal vision foundation model for downstream face security tasks: cross-dataset deepfake detection, cross-domain face anti-spoofing, and unseen diffusion facial forgery detection. Extensive experiments on 10 public datasets demonstrate that our model transfers better than supervised pretraining, visual and facial self-supervised learning arts, and even outperforms task-specialized SOTA methods.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.12032

Country:

Europe (0.28)
North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.90)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Alleviating Structural Distribution Shift in Graph Anomaly Detection

Gao, Yuan, Wang, Xiang, He, Xiangnan, Liu, Zhenguang, Feng, Huamin, Zhang, Yongdong

arXiv.org Artificial IntelligenceJan-25-2024

Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change across training and testing data, which is called structural distribution shift (SDS) in this paper. The mainstream methods are built on graph neural networks (GNNs), benefiting the classification of normals from aggregating homophilous neighbors, yet ignoring the SDS issue for anomalies and suffering from poor generalization. This work solves the problem from a feature view. We observe that the degree of SDS varies between anomalies and normal nodes. Hence to address the issue, the key lies in resisting high heterophily for anomalies meanwhile benefiting the learning of normals from homophily. We tease out the anomaly features on which we constrain to mitigate the effect of heterophilous neighbors and make them invariant. We term our proposed framework as Graph Decomposition Network (GDN). Extensive experiments are conducted on two benchmark datasets, and the proposed framework achieves a remarkable performance boost in GAD, especially in an SDS environment where anomalies have largely different structural distribution across training and testing environments. Codes are open-sourced in https://github.com/blacksingular/wsdm_GDN.

data mining, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539597.3570377

2401.14155

Country:

Asia > Singapore (0.17)
Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Red Teaming Visual Language Models

Li, Mukai, Li, Lei, Yin, Yuwei, Ahmed, Masood, Liu, Zhenguang, Liu, Qi

arXiv.org Artificial IntelligenceJan-23-2024

VLMs (Vision-Language Models) extend the capabilities of LLMs (Large Language Models) to accept multimodal inputs. Since it has been verified that LLMs can be induced to generate harmful or inaccurate content through specific test cases (termed as Red Teaming), how VLMs perform in similar scenarios, especially with their combination of textual and visual inputs, remains a question. To explore this problem, we present a novel red teaming dataset RTVLM, which encompasses 10 subtasks (e.g., image misleading, multi-modal jail-breaking, face fairness, etc) under 4 primary aspects (faithfulness, privacy, safety, fairness). Our RTVLM is the first red-teaming dataset to benchmark current VLMs in terms of these 4 different aspects. Detailed analysis shows that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V. Additionally, we simply apply red teaming alignment to LLaVA-v1.5 with Supervised Fine-tuning (SFT) using RTVLM, and this bolsters the models' performance with 10% in RTVLM test set, 13% in MM-Hal, and without noticeable decline in MM-Bench, overpassing other LLaVA-based models with regular alignment data. This reveals that current open-sourced VLMs still lack red teaming alignment. Our code and datasets will be open-source.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.12915

Country: Asia (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Masked Diffusion Models Are Fast Distribution Learners

Lei, Jiachen, Wang, Qinglong, Cheng, Peng, Ba, Zhongjie, Qin, Zhan, Wang, Zhibo, Liu, Zhenguang, Ren, Kui

arXiv.org Artificial IntelligenceNov-27-2023

Diffusion model has emerged as the \emph{de-facto} model for image generation, yet the heavy training overhead hinders its broader adoption in the research community. We observe that diffusion models are commonly trained to learn all fine-grained visual information from scratch. This paradigm may cause unnecessary training costs hence requiring in-depth investigation. In this work, we show that it suffices to train a strong diffusion model by first pre-training the model to learn some primer distribution that loosely characterizes the unknown real image distribution. Then the pre-trained model can be fine-tuned for various generation tasks efficiently. In the pre-training stage, we propose to mask a high proportion (e.g., up to 90\%) of input images to approximately represent the primer distribution and introduce a masked denoising score matching objective to train a model to denoise visible areas. In subsequent fine-tuning stage, we efficiently train diffusion model without masking. Utilizing the two-stage training framework, we achieves significant training acceleration and a new FID score record of 6.27 on CelebA-HQ $256 \times 256$ for ViT-based diffusion models. The generalizability of a pre-trained model further helps building models that perform better than ones trained from scratch on different downstream datasets. For instance, a diffusion model pre-trained on VGGFace2 attains a 46\% quality improvement when fine-tuned on a different dataset that contains only 3000 images. Our code is available at \url{https://github.com/jiachenlei/maskdm}.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.11363

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting

Liu, Xu, Xia, Yutong, Liang, Yuxuan, Hu, Junfeng, Wang, Yiwei, Bai, Lei, Huang, Chao, Liu, Zhenguang, Hooi, Bryan, Zimmermann, Roger

arXiv.org Artificial IntelligenceOct-28-2023

Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors in California with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.08259

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (0.94)
Government (0.68)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues

Pan, Kun, Yifang, Yin, Wei, Yao, Lin, Feng, Ba, Zhongjie, Liu, Zhenguang, Wang, ZhiBo, Cavallaro, Lorenzo, Ren, Kui

arXiv.org Artificial IntelligenceSep-18-2023

The malicious use and widespread dissemination of deepfake pose a significant crisis of trust. Current deepfake detection models can generally recognize forgery images by training on a large dataset. However, the accuracy of detection models degrades significantly on images generated by new deepfake methods due to the difference in data distribution. To tackle this issue, we present a novel incremental learning framework that improves the generalization of deepfake detection models by continual learning from a small number of new samples. To cope with different data distributions, we propose to learn a domain-invariant representation based on supervised contrastive learning, preventing overfit to the insufficient new data. To mitigate catastrophic forgetting, we regularize our model in both feature-level and label-level based on a multi-perspective knowledge distillation approach. Finally, we propose to select both central and hard representative samples to update the replay set, which is beneficial for both domain-invariant representation learning and rehearsal-based knowledge preserving. We conduct extensive experiments on four benchmark datasets, obtaining the new state-of-the-art average forgetting rate of 7.01 and average accuracy of 85.49 on FF++, DFDC-P, DFD, and CDF2. Our code is released at https://github.com/DeepFakeIL/DFIL.

artificial intelligence, learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2309.09526

Country:

North America > United States (0.28)
Asia > China > Zhejiang Province (0.15)
Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models

Yin, Yuwei, Kaddour, Jean, Zhang, Xiang, Nie, Yixin, Liu, Zhenguang, Kong, Lingpeng, Liu, Qi

arXiv.org Artificial IntelligenceApr-18-2023

Data augmentation has been established as an efficacious approach to supplement useful information for low-resource datasets. Traditional augmentation techniques such as noise injection and image transformations have been widely used. In addition, generative data augmentation (GDA) has been shown to produce more diverse and flexible data. While generative adversarial networks (GANs) have been frequently used for GDA, they lack diversity and controllability compared to text-to-image diffusion models. In this paper, we propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image (T2I) generative models for data augmentation. By conditioning the T2I model on detailed descriptions produced by T2T models, we are able to generate photo-realistic labeled images in a flexible and controllable manner. Experiments on in-domain classification, cross-domain classification, and image captioning tasks show consistent improvements over other data augmentation baselines. Analytical studies in varied settings, including few-shot, long-tail, and adversarial, further reinforce the effectiveness of TTIDA in enhancing performance and increasing robustness.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.08821

Country:

Europe (1.00)
North America > Canada (0.93)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Copy Motion From One to Another: Fake Motion Video Generation

Liu, Zhenguang, Wu, Sifan, Xu, Chejian, Wang, Xiang, Zhu, Lei, Wu, Shuang, Feng, Fuli

arXiv.org Artificial IntelligenceDec-23-2022

One compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion (from a source person). While the state-of-the-art methods are able to synthesize a video demonstrating similar broad stroke motion details, they are generally lacking in texture details. A pertinent manifestation appears as distorted face, feet, and hands, and such flaws are very sensitively perceived by human observers. Furthermore, current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos, inherently requiring a large amount of training samples to learn the texture details for adequate video generation. In this work, we tackle these challenges from three aspects: 1) We disentangle each video frame into foreground (the person) and background, focusing on generating the foreground to reduce the underlying dimension of the network output. 2) We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image. 3) To enhance texture details, we encode facial features with geometric guidance and employ local GANs to refine the face, feet, and hands. Extensive experiments show that our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.

artificial intelligence, machine learning, target person, (16 more...)

arXiv.org Artificial Intelligence

2205.01373

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.48)

Add feedback

Who is Gambling? Finding Cryptocurrency Gamblers Using Multi-modal Retrieval Methods

Huang, Zhengjie, Liu, Zhenguang, Chen, Jianhai, He, Qinming, Wu, Shuang, Zhu, Lei, Wang, Meng

arXiv.org Artificial IntelligenceNov-27-2022

With the popularity of cryptocurrencies and the remarkable development of blockchain technology, decentralized applications emerged as a revolutionary force for the Internet. Meanwhile, decentralized applications have also attracted intense attention from the online gambling community, with more and more decentralized gambling platforms created through the help of smart contracts. Compared with conventional gambling platforms, decentralized gambling have transparent rules and a low participation threshold, attracting a substantial number of gamblers. In order to discover gambling behaviors and identify the contracts and addresses involved in gambling, we propose a tool termed ETHGamDet. The tool is able to automatically detect the smart contracts and addresses involved in gambling by scrutinizing the smart contract code and address transaction records. Interestingly, we present a novel LightGBM model with memory components, which possesses the ability to learn from its own misclassifications. As a side contribution, we construct and release a large-scale gambling dataset at https://github.com/AwesomeHuang/Bitcoin-Gambling-Dataset to facilitate future research in this field. Empirically, ETHGamDet achieves a F1-score of 0.72 and 0.89 in address classification and contract classification respectively, and offers novel and interesting insights.

contract, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s13735-022-00264-3

2211.14779

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback