Not enough data to create a plot.
Try a different view from the menu above.
Wang, Hui
UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities
Li, Yangning, Lv, Qingsong, Yu, Tianyu, Li, Yinghui, Huang, Shulin, Lu, Tingwei, Hu, Xuming, JIang, Wenhao, Zheng, Hai-Tao, Wang, Hui
Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes with more specific attribute constraints. Describing it with positive seed entities alone cause two issues: (i) Ambiguity among ultra-fine-grained semantic classes. (ii) Inability to define "unwanted" semantic. Due to these inherent shortcomings, previous methods struggle to address the ultra-fine-grained ESE (Ultra-ESE). To solve this issue, we first introduce negative seed entities in the inputs, which belong to the same fine-grained semantic class as the positive seed entities but differ in certain attributes. Negative seed entities eliminate the semantic ambiguity by contrast between positive and negative attributes. Meanwhile, it provide a straightforward way to express "unwanted". To assess model performance in Ultra-ESE, we constructed UltraWiki, the first large-scale dataset tailored for Ultra-ESE. UltraWiki encompasses 236 ultra-fine-grained semantic classes, where each query of them is represented with 3-5 positive and negative seed entities. A retrieval-based framework RetExpan and a generation-based framework GenExpan are proposed to comprehensively assess the efficacy of large language models from two different paradigms in Ultra-ESE. Moreover, we devised three strategies to enhance models' comprehension of ultra-fine-grained entities semantics: contrastive learning, retrieval augmentation, and chain-of-thought reasoning. Extensive experiments confirm the effectiveness of our proposed strategies and also reveal that there remains a large space for improvement in Ultra-ESE.
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Zhang, Han, Gui, Lin, Lei, Yu, Zhai, Yuanzhao, Zhang, Yehong, He, Yulan, Wang, Hui, Yu, Yue, Wong, Kam-Fai, Liang, Bin, Xu, Ruifeng
Reinforcement Learning from Human Feedback (RLHF) is commonly utilized to improve the alignment of Large Language Models (LLMs) with human preferences. Given the evolving nature of human preferences, continual alignment becomes more crucial and practical in comparison to traditional static alignment. Nevertheless, making RLHF compatible with Continual Learning (CL) is challenging due to its complex process. Meanwhile, directly learning new human preferences may lead to Catastrophic Forgetting (CF) of historical preferences, resulting in helpless or harmful outputs. To overcome these challenges, we propose the Continual Optimal Policy Regularization (COPR) method, which draws inspiration from the optimal policy theory. COPR utilizes a sampling distribution as a demonstration and regularization constraints for CL. It adopts the Lagrangian Duality (LD) method to dynamically regularize the current policy based on the historically optimal policy, which prevents CF and avoids over-emphasizing unbalanced objectives. We also provide formal proof for the learnability of COPR. The experimental results show that COPR outperforms strong CL baselines on our proposed benchmark, in terms of reward-based, GPT-4 evaluations and human assessment. Furthermore, we validate the robustness of COPR under various CL settings, including different backbones, replay memory sizes, and learning orders.
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Cheng, Luyao, Zheng, Siqi, Zhang, Qinglin, Wang, Hui, Chen, Yafeng, Chen, Qian, Zhang, Shiliang
Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit these semantic cues utilizing language models. In this work we propose a novel approach to effectively leverage semantic information in clustering-based speaker diarization systems. Firstly, we introduce spoken language understanding modules to extract speaker-related semantic information and utilize these information to construct pairwise constraints. Secondly, we present a novel framework to integrate these constraints into the speaker diarization pipeline, enhancing the performance of the entire system. Extensive experiments conducted on the public dataset demonstrate the consistent superiority of our proposed approach over acoustic-only speaker diarization systems.
SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models
Luo, Jinglong, Zhang, Yehong, Zhang, Jiaqi, Mu, Xin, Wang, Hui, Yu, Yue, Xu, Zenglin
With the growing use of large language models hosted on cloud platforms to offer inference services, privacy concerns are escalating, especially concerning sensitive data like investment plans and bank account details. Secure Multi-Party Computing (SMPC) emerges as a promising solution to protect the privacy of inference data and model parameters. However, the application of SMPC in Privacy-Preserving Inference (PPI) for large language models, particularly those based on the Transformer architecture, often leads to considerable slowdowns or declines in performance. This is largely due to the multitude of nonlinear operations in the Transformer architecture, which are not well-suited to SMPC and difficult to circumvent or optimize effectively. To address this concern, we introduce an advanced optimization framework called SecFormer, to achieve fast and accurate PPI for Transformer models. By implementing model design optimization, we successfully eliminate the high-cost exponential and maximum operations in PPI without sacrificing model performance. Additionally, we have developed a suite of efficient SMPC protocols that utilize segmented polynomials, Fourier series and Goldschmidt's method to handle other complex nonlinear functions within PPI, such as GeLU, LayerNorm, and Softmax. Our extensive experiments reveal that SecFormer outperforms MPCFormer in performance, showing improvements of $5.6\%$ and $24.2\%$ for BERT$_{\text{BASE}}$ and BERT$_{\text{LARGE}}$, respectively. In terms of efficiency, SecFormer is 3.56 and 3.58 times faster than Puma for BERT$_{\text{BASE}}$ and BERT$_{\text{LARGE}}$, demonstrating its effectiveness and speed.
Training and Serving System of Foundation Models: A Comprehensive Survey
Zhou, Jiahang, Chen, Yanyu, Hong, Zicong, Chen, Wuhui, Yu, Yue, Zhang, Tao, Wang, Hui, Zhang, Chuanfu, Zheng, Zibin
Foundation models (e.g., ChatGPT, DALL-E, PengCheng Mind, PanGu-$\Sigma$) have demonstrated extraordinary performance in key technological areas, such as natural language processing and visual recognition, and have become the mainstream trend of artificial general intelligence. This has led more and more major technology giants to dedicate significant human and financial resources to actively develop their foundation model systems, which drives continuous growth of these models' parameters. As a result, the training and serving of these models have posed significant challenges, including substantial computing power, memory consumption, bandwidth demands, etc. Therefore, employing efficient training and serving strategies becomes particularly crucial. Many researchers have actively explored and proposed effective methods. So, a comprehensive survey of them is essential for system developers and researchers. This paper extensively explores the methods employed in training and serving foundation models from various perspectives. It provides a detailed categorization of these state-of-the-art methods, including finer aspects such as network, computing, and storage. Additionally, the paper summarizes the challenges and presents a perspective on the future development direction of foundation model systems. Through comprehensive discussion and analysis, it hopes to provide a solid theoretical basis and practical guidance for future research and applications, promoting continuous innovation and development in foundation model systems.
EncryIP: A Practical Encryption-Based Framework for Model Intellectual Property Protection
Mu, Xin, Wang, Yu, Huang, Zhengan, Lai, Junzuo, Zhang, Yehong, Wang, Hui, Yu, Yue
In the rapidly growing digital economy, protecting intellectual property (IP) associated with digital products has become increasingly important. Within this context, machine learning (ML) models, being highly valuable digital assets, have gained significant attention for IP protection. This paper introduces a practical encryption-based framework called \textit{EncryIP}, which seamlessly integrates a public-key encryption scheme into the model learning process. This approach enables the protected model to generate randomized and confused labels, ensuring that only individuals with accurate secret keys, signifying authorized users, can decrypt and reveal authentic labels. Importantly, the proposed framework not only facilitates the protected model to multiple authorized users without requiring repetitive training of the original ML model with IP protection methods but also maintains the model's performance without compromising its accuracy. Compared to existing methods like watermark-based, trigger-based, and passport-based approaches, \textit{EncryIP} demonstrates superior effectiveness in both training protected models and efficiently detecting the unauthorized spread of ML models.
Multimodality of AI for Education: Towards Artificial General Intelligence
Lee, Gyeong-Geon, Shi, Lehong, Latif, Ehsan, Gao, Yizhu, Bewersdorff, Arne, Nyaaba, Matthew, Guo, Shuchen, Wu, Zihao, Liu, Zhengliang, Wang, Hui, Mai, Gengchen, Liu, Tiaming, Zhai, Xiaoming
This paper presents a comprehensive examination of how multimodal artificial intelligence (AI) approaches are paving the way towards the realization of Artificial General Intelligence (AGI) in educational contexts. It scrutinizes the evolution and integration of AI in educational systems, emphasizing the crucial role of multimodality, which encompasses auditory, visual, kinesthetic, and linguistic modes of learning. This research delves deeply into the key facets of AGI, including cognitive frameworks, advanced knowledge representation, adaptive learning mechanisms, strategic planning, sophisticated language processing, and the integration of diverse multimodal data sources. It critically assesses AGI's transformative potential in reshaping educational paradigms, focusing on enhancing teaching and learning effectiveness, filling gaps in existing methodologies, and addressing ethical considerations and responsible usage of AGI in educational settings. The paper also discusses the implications of multimodal AI's role in education, offering insights into future directions and challenges in AGI development. This exploration aims to provide a nuanced understanding of the intersection between AI, multimodality, and education, setting a foundation for future research and development in AGI.
Seeing ChatGPT Through Universities' Policies, Resources and Guidelines
Wang, Hui, Dang, Anh, Wu, Zihao, Mac, Son
The advancements in Artificial Intelligence (AI) technologies such as ChatGPT have gained popularity in recent days. The integration of ChatGPT in educational contexts has already created attractions due to a wide range of applications. However, the automatic generation of human-like texts also poses potential risks to academic integrity, especially when faced with writing-intensive language courses. Considering the ongoing debates, this study aims to investigate the academic policies and guidelines established by US universities regarding the use of ChatGPT in teaching and learning. The data sources include academic policies, statements, guidelines as well as relevant resources that were provided by the top 50 universities in the United States, according to U.S. News. Thematic analysis and qualitative analysis were employed in the analysis and showed that most top 50 universities were open but cautious towards the integration of generative AI in teaching and learning and also expressed their concerns on ethical usage, accuracy, and data privacy. Most universities also provided a variety of resources and guidelines, including syllabus templates/samples, workshops and discussions, shared articles, and one-on-one consultations, with focuses on general technical introduction, ethical concerns, pedagogical applications, preventive strategies, data privacy, limitations, and detective tools. The findings will inform future policy-making regarding the integration of ChatGPT in college-level education and influence the provision of supportive resources by universities for the appropriate application of ChatGPT in education.
COPF: Continual Learning Human Preference through Optimal Policy Fitting
Zhang, Han, Gui, Lin, Zhai, Yuanzhao, Wang, Hui, Lei, Yu, Xu, Ruifeng
In the realm of natural language processing (NLP), large language models (LLMs) are vital tools with the potential to bridge human language and machine understanding. Learning human preferences is a crucial step towards ensuring that language models not only generate responses that are useful to users but also adhere to ethical and societal norms, namely helpful and harmless responses [1]. However, they face a fundamental challenge in aligning with human preferences and values, hindering their full potential. Traditional alignment methods, namely Reinforcement Learning from Human Feedback (RLHF) [2, 3], involve supervised fine-tuning (SFT), reward model (RM) training, and policy model training. This complex pipeline lacks flexibility for continual learning (CL) of human preferences, hence existing work [1] often necessitates retraining models to adapt to dynamic preferences. Hence, there is a pressing need for research into continual alignment methods to address this limitation, enabling LLMs to better adhere to evolving human preferences and values while generating helpful responses. In this paper, we propose an innovative approach to address these challenges by enhancing the utility of the Deterministic Policy Optimization (DPO) [4] algorithm, a non-reinforcement learning, and a non-continual learning method. DPO, rooted in rigorous reinforcement learning theory, offers promising advantages but suffers from three critical limitations: 1. DPO is not supported for evolving human preferences which is common in real-world applications.
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Zheng, Siqi, Cheng, Luyao, Chen, Yafeng, Wang, Hui, Chen, Qian
Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/