AITopics | Deng, Weiwei

Collaborating Authors

Deng, Weiwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text Diffusion with Reinforced Conditioning

Liu, Yuxuan, Yang, Tianchi, Huang, Shaohan, Zhang, Zihan, Huang, Haizhen, Wei, Furu, Deng, Weiwei, Sun, Feng, Zhang, Qi

arXiv.org Artificial IntelligenceFeb-19-2024

Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive sequence generation. However, existing text diffusion models still fall short in their performance due to a challenge in handling the discreteness of language. This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling. Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling. Our extensive experiments demonstrate the competitiveness of TREC against autoregressive, non-autoregressive, and diffusion baselines. Moreover, qualitative analysis shows its advanced ability to fully utilize the diffusion process in refining samples.

diffusion process, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.14843

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)

Add feedback

Improving Domain Adaptation through Extended-Text Reading Comprehension

Jiang, Ting, Huang, Shaohan, Luo, Shengyue, Zhang, Zihan, Huang, Haizhen, Wei, Furu, Deng, Weiwei, Sun, Feng, Zhang, Qi, Wang, Deqing, Zhuang, Fuzhen

arXiv.org Artificial IntelligenceJan-18-2024

To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at https://github.com/microsoft/LMOps.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.07284

Genre: Research Report (0.50)

Industry: Education > Assessment & Standards > Student Performance (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Wang, Zhaoyang, Huang, Shaohan, Liu, Yuxuan, Wang, Jiahai, Song, Minghui, Zhang, Zihan, Huang, Haizhen, Wei, Furu, Deng, Weiwei, Sun, Feng, Zhang, Qi

arXiv.org Artificial IntelligenceOct-20-2023

Large language models (LLMs) exhibit impressive emergent abilities in natural language processing, but their democratization is hindered due to huge computation requirements and closed-source nature. Recent research on advancing open-source smaller LMs by distilling knowledge from black-box LLMs has obtained promising results in the instruction-following ability. However, the reasoning ability which is more challenging to foster, is relatively rarely explored. In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability. In contrast to merely employing LLM as a data annotator, we exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. This paradigm enables the student to expose its deficiencies to the black-box teacher who then can provide customized training data in return. Further, to exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes. The learning from self-reflection and LLM are all tailored to the student's learning status, thanks to the seamless integration with the multi-round learning paradigm. Comprehensive experiments and analysis on mathematical and commonsense reasoning tasks demonstrate the effectiveness of our method. The code will be available at https://github.com/Raibows/Learn-to-Reason.

democratizing reasoning ability, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.13332

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Calibrating LLM-Based Evaluator

Liu, Yuxuan, Yang, Tianchi, Huang, Shaohan, Zhang, Zihan, Huang, Haizhen, Wei, Furu, Deng, Weiwei, Sun, Feng, Zhang, Qi

arXiv.org Artificial IntelligenceSep-23-2023

Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation. However, hindered by the closed-source or high computational demand to host and tune, there is a lack of practice to further calibrate an off-the-shelf LLM-based evaluator towards better human alignment. In this work, we propose AutoCalibrate, a multi-stage, gradient-free approach to automatically calibrate and align an LLM-based evaluator toward human preference. Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels. Then, an initial set of scoring criteria is drafted by the language model itself, leveraging in-context learning on different few-shot examples. To further calibrate this set of criteria, we select the best performers and re-draft them with self-refinement. Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration. Our comprehensive qualitative analysis conveys insightful intuitions and observations on the essence of effective scoring criteria.

criteria, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.13308

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

Wang, Dong, Salamatian, Kavé, Xia, Yunqing, Deng, Weiwei, Zhiang, Qi

arXiv.org Artificial IntelligenceAug-17-2023

Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging. Up to now two directions have been explored to integrate multi-modal inputs in fine-tuning of pre-trained language models. One consists of fusing the outcome of language models and non-textual features through an aggregation layer, resulting into ensemble framework, where the cross-information between textual and non-textual inputs are only learned in the aggregation layer. The second one consists of splitting non-textual features into fine-grained fragments and transforming the fragments to new tokens combined with textual ones, so that they can be fed directly to transformer layers in language models. However, this approach increases the complexity of the learning and inference because of the numerous additional tokens. To address these limitations, we propose in this work a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features while maintaining low time-costs in training and inference through a dimensionality reduction. Comprehensive experiments on both public and commercial data demonstrate that BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to CTR prediction.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2308.11527

Country: North America > United States (0.49)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Li, Junyan, Zhang, Li Lyna, Xu, Jiahang, Wang, Yujing, Yan, Shaoguang, Xia, Yunqing, Yang, Yuqing, Cao, Ting, Sun, Hao, Deng, Weiwei, Zhang, Qi, Yang, Mao

arXiv.org Artificial IntelligenceJun-25-2023

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a constraint-aware and ranking-distilled token pruning method ToP, which selectively removes unnecessary tokens as input sequence passes through layers, allowing the model to improve online inference speed while preserving accuracy. ToP overcomes the limitation of inaccurate token importance ranking in the conventional self-attention mechanism through a ranking-distilled token distillation technique, which distills effective token rankings from the final layer of unpruned models to early layers of pruned models. Then, ToP introduces a coarse-to-fine pruning approach that automatically selects the optimal subset of transformer layers and optimizes token pruning decisions within these layers through improved $L_0$ regularization. Extensive experiments on GLUE benchmark and SQuAD tasks demonstrate that ToP outperforms state-of-the-art token pruning and model compression methods with improved accuracy and speedups. ToP reduces the average FLOPs of BERT by 8.1x while achieving competitive accuracy on GLUE, and provides a real latency speedup of up to 7.4x on an Intel CPU.

machine learning, natural language, pruning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599284

2306.14393

Country:

Europe (1.00)
North America > United States (0.48)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

To Copy Rather Than Memorize: A Vertical Learning Paradigm for Knowledge Graph Completion

Li, Rui, Chen, Xu, Li, Chaozhuo, Shen, Yanming, Zhao, Jianan, Wang, Yujing, Han, Weihao, Sun, Hao, Deng, Weiwei, Zhang, Qi, Xie, Xing

arXiv.org Artificial IntelligenceMay-23-2023

Embedding models have shown great power in knowledge graph completion (KGC) task. By learning structural constraints for each training triple, these methods implicitly memorize intrinsic relation rules to infer missing links. However, this paper points out that the multi-hop relation rules are hard to be reliably memorized due to the inherent deficiencies of such implicit memorization strategy, making embedding models underperform in predicting links between distant entity pairs. To alleviate this problem, we present Vertical Learning Paradigm (VLP), which extends embedding models by allowing to explicitly copy target information from related factual triples for more accurate prediction. Rather than solely relying on the implicit memory, VLP directly provides additional cues to improve the generalization ability of embedding models, especially making the distant link prediction significantly easier. Moreover, we also propose a novel relative distance based negative sampling technique (ReD) for more effective optimization. Experiments demonstrate the validity and generality of our proposals on two standard benchmarks. Our code is available at https://github.com/rui9812/VLP.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2305.14126

Country: Asia (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

Li, Ziheng, Huang, Shaohan, Zhang, Zihan, Deng, Zhi-Hong, Lou, Qiang, Huang, Haizhen, Jiao, Jian, Wei, Furu, Deng, Weiwei, Zhang, Qi

arXiv.org Artificial IntelligenceMay-15-2023

Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding. However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously. Based on our findings, we propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding that incorporates both sentence-level and token-level alignment. To achieve this, we introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart. This reconstruction objective encourages the model to embed translation information into the token representation. Compared to other token-level alignment methods such as translation language modeling, RTL is more suitable for dual encoder architectures and is computationally efficient. Extensive experiments on three sentence-level cross-lingual benchmarks demonstrate that our approach can significantly improve sentence embedding. Our code is available at https://github.com/ChillingDream/DAP.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.09148

Country:

Europe (0.68)
Asia > China (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Pre-training Language Model as a Multi-perspective Course Learner

Chen, Beiduo, Huang, Shaohan, Zhang, Zihan, Guo, Wu, Ling, Zhenhua, Huang, Haizhen, Wei, Furu, Deng, Weiwei, Zhang, Qi

arXiv.org Artificial IntelligenceMay-6-2023

ELECTRA, the generator-discriminator pre-training framework, has achieved impressive semantic construction capability among various downstream tasks. Despite the convincing performance, ELECTRA still faces the challenges of monotonous training and deficient interaction. Generator with only masked language modeling (MLM) leads to biased learning and label imbalance for discriminator, decreasing learning efficiency; no explicit feedback loop from discriminator to generator results in the chasm between these two components, underutilizing the course learning. In this study, a multi-perspective course learning (MCL) method is proposed to fetch a many degrees and visual angles for sample-efficient pre-training, and to fully leverage the relationship between generator and discriminator. Concretely, three self-supervision courses are designed to alleviate inherent flaws of MLM and balance the label in a multi-perspective way. Besides, two self-correction courses are proposed to bridge the chasm between the two encoders by creating a "correction notebook" for secondary-supervision. Moreover, a course soups trial is conducted to solve the "tug-of-war" dynamics problem of MCL, evolving a stronger pre-trained model. Experimental results show that our method significantly improves ELECTRA's average performance by 2.8% and 3.2% absolute points respectively on GLUE and SQuAD 2.0 benchmarks, and overshadows recent advanced ELECTRA-style models under the same settings. The pre-trained MCL model is available at https://huggingface.co/McmanusChen/MCL-base.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.03981

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

HousE: Knowledge Graph Embedding with Householder Parameterization

Li, Rui, Zhao, Jianan, Li, Chaozhuo, He, Di, Wang, Yiqi, Liu, Yuming, Sun, Hao, Wang, Senzhang, Deng, Weiwei, Shen, Yanming, Xie, Xing, Zhang, Qi

arXiv.org Artificial IntelligenceFeb-16-2022

The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties. However, existing approaches can only capture some of them with insufficient modeling capacity. In this work, we propose a more powerful KGE framework named HousE, which involves a novel parameterization based on two kinds of Householder transformations: (1) Householder rotations to achieve superior capacity of modeling relation patterns; (2) Householder projections to handle sophisticated relation mapping properties. Theoretically, HousE is capable of modeling crucial relation patterns and mapping properties simultaneously. Besides, HousE is a generalization of existing rotation-based models while extending the rotations to high-dimensional spaces. Empirically, HousE achieves new state-of-the-art performance on five benchmark datasets. Our code is available at https://github.com/anrep/HousE.

householder parameterization, knowledge graph embedding

arXiv.org Artificial Intelligence

2202.07919

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)

Add feedback