AITopics | Chen, Jingdong

Collaborating Authors

Chen, Jingdong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup

Wang, Maolin, Zhao, Yao, Liu, Jiajia, Chen, Jingdong, Zhuang, Chenyi, Gu, Jinjie, Guo, Ruocheng, Zhao, Xiangyu

arXiv.org Artificial IntelligenceDec-10-2023

The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy. Furthermore, the effectiveness of our strategy is evident in its operational success in Alipay's real-world multimodal advertisement audition for three months from September 2023. Notably, our approach achieved a substantial reduction in latency, decreasing it from 700ms to 90ms, while maintaining online performance with only a slight performance decrease. Moreover, our compressed model is estimated to reduce electricity consumption by approximately 75 million kWh annually compared to the direct deployment of AntGMM, demonstrating our commitment to green AI initiatives. We will publicly release our code and the MAAD dataset after some reviews\footnote{https://github.com/MorinW/AntGMM$\_$Pruning}.

large language model, machine learning, pruning, (11 more...)

arXiv.org Artificial Intelligence

2312.05795

Country: Asia > China (0.29)

Genre: Research Report > New Finding (1.00)

Industry:

Marketing (0.90)
Energy (0.74)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

Xu, Weidi, Wang, Jingwei, Xie, Lele, He, Jianshan, Zhou, Hongting, Wang, Taifeng, Wan, Xiaopei, Chen, Jingdong, Qu, Chao, Chu, Wei

arXiv.org Artificial IntelligenceSep-29-2023

Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints. This paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. By exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations effectively mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. Empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.

logic & formal reasoning, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.15458

Country:

Europe (1.00)
North America > Canada > Quebec (0.28)
North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes

Huang, Hao, Wang, Yongtao, Chen, Zhaoyu, Li, Yuheng, Tang, Zhi, Chu, Wei, Chen, Jingdong, Lin, Weisi, Ma, Kai-Kuang

arXiv.org Artificial IntelligenceMay-23-2021

Malicious application of deepfakes (i.e., technologies can generate target faces or face attributes) has posed a huge threat to our society. The fake multimedia content generated by deepfake models can harm the reputation and even threaten the property of the person who has been impersonated. Fortunately, the adversarial watermark could be used for combating deepfake models, leading them to generate distorted images. The existing methods require an individual training process for every facial image, to generate the adversarial watermark against a specific deepfake model, which are extremely inefficient. To address this problem, we propose a universal adversarial attack method on deepfake models, to generate a Cross-Model Universal Adversarial Watermark (CMUA-Watermark) that can protect thousands of facial images from multiple deepfake models. Specifically, we first propose a cross-model universal attack pipeline by attacking multiple deepfake models and combining gradients from these models iteratively. Then we introduce a batch-based method to alleviate the conflict of adversarial watermarks generated by different facial images. Finally, we design a more reasonable and comprehensive evaluation method for evaluating the effectiveness of the adversarial watermark. Experimental results demonstrate that the proposed CMUA-Watermark can effectively distort the fake facial images generated by deepfake models and successfully protect facial images from deepfakes in real scenes.

deep learning, neural network, watermark, (17 more...)

arXiv.org Artificial Intelligence

2105.10872

Country:

Asia > China (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking

Du, Xingjian, Zhu, Mengyao, Shi, Xuan, Zhang, Xinpeng, Zhang, Wen, Chen, Jingdong

arXiv.org Artificial IntelligenceJan-2-2019

Recently, phase processing is attracting increasinginterest in speech enhancement community. Some researchersintegrate phase estimations module into speech enhancementmodels by using complex-valued short-time Fourier transform(STFT) spectrogram based training targets, e.g. Complex RatioMask (cRM) [1]. However, masking on spectrogram would violentits consistency constraints. In this work, we prove that theinconsistent problem enlarges the solution space of the speechenhancement model and causes unintended artifacts. ConsistencySpectrogram Masking (CSM) is proposed to estimate the complexspectrogram of a signal with the consistency constraint in asimple but not trivial way. The experiments comparing ourCSM based end-to-end model with other methods are conductedto confirm that the CSM accelerate the model training andhave significant improvements in speech quality. From ourexperimental results, we assured that our method could enha

artificial intelligence, neural network, spectrogram, (14 more...)

arXiv.org Artificial Intelligence

1901.00295

Country:

Oceania > Australia > Queensland (0.14)
North America > United States > Utah (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.69)

Add feedback

Blind channel identification for speech dereverberation using l1-norm sparse learning

Lin, Yuanqing, Chen, Jingdong, Kim, Youngmoo, Lee, Daniel D.

Neural Information Processing SystemsDec-31-2008

Speech dereverberation remains an open problem after more than three decades of research. The most challenging step in speech dereverberation is blind channel identification (BCI). Although many BCI approaches have been developed, their performance is still far from satisfactory for practical applications. The main difficulty in BCI lies in finding an appropriate acoustic model, which not only can effectively resolve solution degeneracies due to the lack of knowledge of the source, but also robustly models real acoustic environments. This paper proposes a sparse acoustic room impulse response (RIR) model for BCI, that is, an acoustic RIR can be modeled by a sparse FIR filter.

artificial intelligence, bsci approach, machine learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Add feedback

Blind channel identification for speech dereverberation using l1-norm sparse learning

Lin, Yuanqing, Chen, Jingdong, Kim, Youngmoo, Lee, Daniel D.

Neural Information Processing SystemsDec-31-2008

Speech dereverberation remains an open problem after more than three decades of research. The most challenging step in speech dereverberation is blind channel identification(BCI). Although many BCI approaches have been developed, their performance is still far from satisfactory for practical applications. The main difficulty in BCI lies in finding an appropriate acoustic model, which not only can effectively resolve solution degeneracies due to the lack of knowledge of the source, but also robustly models real acoustic environments. This paper proposes a sparse acoustic room impulse response (RIR) model for BCI, that is, an acoustic RIRcan be modeled by a sparse FIR filter.

artificial intelligence, bsci approach, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Add feedback