Wang, Shuaiqiang
Cross-model Control: Improving Multiple Large Language Models in One-time Training
Wu, Jiayi, Sun, Hao, Cai, Hengyi, Su, Lixin, Wang, Shuaiqiang, Yin, Dawei, Li, Xiang, Gao, Ming
The number of large language models (LLMs) with varying parameter scales and vocabularies is increasing. While they deliver powerful performance, they also face a set of common optimization needs to meet specific requirements or standards, such as instruction following or avoiding the output of sensitive information from the real world. However, how to reuse the fine-tuning outcomes of one model to other models to reduce training costs remains a challenge. To bridge this gap, we introduce Cross-model Control (CMC), a method that improves multiple LLMs in one-time training with a portable tiny language model. Specifically, we have observed that the logit shift before and after fine-tuning is remarkably similar across different models. Based on this insight, we incorporate a tiny language model with a minimal number of parameters. By training alongside a frozen template LLM, the tiny model gains the capability to alter the logits output by the LLMs. To make this tiny language model applicable to models with different vocabularies, we propose a novel token mapping strategy named PM-MinED. We have conducted extensive experiments on instruction tuning and unlearning tasks, demonstrating the effectiveness of CMC. Our code is available at https://github.com/wujwyi/CMC.
AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning
Sun, Hao, Wu, Jiayi, Cai, Hengyi, Wei, Xiaochi, Feng, Yue, Wang, Bo, Wang, Shuaiqiang, Zhang, Yan, Yin, Dawei
Recent advancements in large language models (LLMs) have been remarkable. Users face a choice between using cloud-based LLMs for generation quality and deploying local-based LLMs for lower computational cost. The former option is typically costly and inefficient, while the latter usually fails to deliver satisfactory performance for reasoning steps requiring deliberate thought processes. In this work, we propose a novel LLM utilization paradigm that facilitates the collaborative operation of large cloud-based LLMs and smaller local-deployed LLMs. Our framework comprises two primary modules: the local agent instantiated with a relatively smaller LLM, handling less complex reasoning steps, and the cloud agent equipped with a larger LLM, managing more intricate reasoning steps. This collaborative processing is enabled through an adaptive mechanism where the local agent introspectively identifies errors and proactively seeks assistance from the cloud agent, thereby effectively integrating the strengths of both locally-deployed and cloud-based LLMs, resulting in significant enhancements in task completion performance and efficiency. We evaluate AdaSwitch across 7 benchmarks, ranging from mathematical reasoning and complex question answering, using various types of LLMs to instantiate the local and cloud agents. The empirical results show that AdaSwitch effectively improves the performance of the local agent, and sometimes achieves competitive results compared to the cloud agent while utilizing much less computational overhead.
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Qu, Changle, Dai, Sunhao, Wei, Xiaochi, Cai, Hengyi, Wang, Shuaiqiang, Yin, Dawei, Xu, Jun, Wen, Ji-Rong
Tool learning enables Large Language Models (LLMs) to interact with external environments by invoking tools, serving as an effective strategy to mitigate the limitations inherent in their pre-training data. In this process, tool documentation plays a crucial role by providing usage instructions for LLMs, thereby facilitating effective tool utilization. This paper concentrates on the critical challenge of bridging the comprehension gap between LLMs and external tools due to the inadequacies and inaccuracies inherent in existing human-centric tool documentation. We propose a novel framework, DRAFT, aimed at Dynamically Refining tool documentation through the Analysis of Feedback and Trails emanating from LLMs' interactions with external tools. This methodology pivots on an innovative trial-and-error approach, consisting of three distinct learning phases: experience gathering, learning from experience, and documentation rewriting, to iteratively enhance the tool documentation. This process is further optimized by implementing a diversity-promoting exploration strategy to ensure explorative diversity and a tool-adaptive termination mechanism to prevent overfitting while enhancing efficiency. Extensive experiments on multiple datasets demonstrate that DRAFT's iterative, feedback-based refinement significantly ameliorates documentation quality, fostering a deeper comprehension and more effective utilization of tools by LLMs. Notably, our analysis reveals that the tool documentation refined via our approach demonstrates robust cross-model generalization capabilities.
Pre-trained Graphformer-based Ranking at Web-scale Search (Extended Abstract)
Li, Yuchen, Xiong, Haoyi, Kong, Linghe, Sun, Zeyi, Chen, Hongyang, Wang, Shuaiqiang, Yin, Dawei
Although graphformer[Yang et al., 2021] has been proposed to combine advantages from GNNs and Both Transformer and Graph Neural Networks Transformers for representation learning with textual graphs, (GNNs) have been employed in the domain of learning there still lack of joint efforts from the two domains (i.e., to rank (LTR). However, these approaches adhere query-webpage pairs and graphs) in LTR. In order to improve to two distinct yet complementary problem the performance of over-parameterized models like formulations: ranking score regression based on Transformers or GNNs, the paradigm of pre-training and query-webpage pairs, and link prediction within fine-tuning has been extensively employed[Liao et al., 2024; query-webpage bipartite graphs, respectively. While Chen et al., 2024g; Chen et al., 2022; Song et al., 2024; it is possible to pre-train GNNs or Transformers on Lyu et al., 2023]. This involves firstly training the models source datasets and subsequently fine-tune them on on large-scale source datasets in an unsupervised or selfsupervised sparsely annotated LTR datasets, the distributional manner to develop their core representation learning shifts between the pair-based and bipartite graph capabilities [Qiang et al., 2023; Xiong et al., 2024a; domains present significant challenges in integrating Xiong et al., 2024b; Lyu et al., 2020]. Subsequently, the pretrained these heterogeneous models into a unified LTR models can be fine-tuned using a small number of annotated framework at web scale. To address this, we introduce samples from the target datasets [Kirichenko et al., 2022; the novel MPGraf model, which leverages Huang et al., 2021; Chen et al., 2023e; Chen et al., 2023d; a modular and capsule-based pre-training strategy, Chen et al., 2023b]. However, such paradigm could not be aiming to cohesively integrate the regression capabilities easily followed by the LTR models leveraging both querywebpage of Transformers with the link prediction pairs and graphs together.
Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)
Li, Yuchen, Xiong, Haoyi, Kong, Linghe, Bian, Jiang, Wang, Shuaiqiang, Chen, Guihai, Yin, Dawei
Learning to rank (LTR) is widely employed in web The optimization of the user experience, achieved by catering searches to prioritize pertinent webpages from retrieved to information needs, largely depends on the effective content based on input queries. However, sorting of retrieved content. In this realm, Learning to Rank traditional LTR models encounter two principal obstacles (LTR) becomes instrumental, requiring a considerable amount that lead to suboptimal performance: (1) the of query-webpage pairings with relevancy scores for effective lack of well-annotated query-webpage pairs with supervised LTR [Li et al., 2023b; Qin and Liu, 2013; ranking scores covering a diverse range of search Li et al., 2023c; Lyu et al., 2020; Peng et al., 2024; query popularities, which hampers their ability to Wang et al., 2024b]. Nevertheless, the commonplace scarcity address queries across the popularity spectrum, and of well-described, query-webpage pairings often compels (2) inadequately trained models that fail to induce semi-supervised LTR, harnessing both labeled and unlabeled generalized representations for LTR, resulting in samples for the process [Szummer and Yilmaz, 2011; overfitting. To address these challenges, we propose Zhang et al., 2016; Zhu et al., 2023; Peng et al., 2023].
Hyperbolic Knowledge Transfer in Cross-Domain Recommendation System
Yang, Xin, Chang, Heng, Lai, Zhijian, Yang, Jinze, Li, Xingrun, Lu, Yu, Wang, Shuaiqiang, Yin, Dawei, Min, Erxue
Cross-Domain Recommendation (CDR) seeks to utilize knowledge from different domains to alleviate the problem of data sparsity in the target recommendation domain, and it has been gaining more attention in recent years. Although there have been notable advancements in this area, most current methods represent users and items in Euclidean space, which is not ideal for handling long-tail distributed data in recommendation systems. Additionally, adding data from other domains can worsen the long-tail characteristics of the entire dataset, making it harder to train CDR models effectively. Recent studies have shown that hyperbolic methods are particularly suitable for modeling long-tail distributions, which has led us to explore hyperbolic representations for users and items in CDR scenarios. However, due to the distinct characteristics of the different domains, applying hyperbolic representation learning to CDR tasks is quite challenging. In this paper, we introduce a new framework called Hyperbolic Contrastive Learning (HCTS), designed to capture the unique features of each domain while enabling efficient knowledge transfer between domains. We achieve this by embedding users and items from each domain separately and mapping them onto distinct hyperbolic manifolds with adjustable curvatures for prediction. To improve the representations of users and items in the target domain, we develop a hyperbolic contrastive learning module for knowledge transfer. Extensive experiments on real-world datasets demonstrate that hyperbolic manifolds are a promising alternative to Euclidean space for CDR tasks.
When Search Engine Services meet Large Language Models: Visions and Challenges
Xiong, Haoyi, Bian, Jiang, Li, Yuchen, Li, Xuhong, Du, Mengnan, Wang, Shuaiqiang, Yin, Dawei, Helal, Sumi
Combining Large Language Models (LLMs) with search engine services marks a significant shift in the field of services computing, opening up new possibilities to enhance how we search for and retrieve information, understand content, and interact with internet services. This paper conducts an in-depth examination of how integrating LLMs with search engines can mutually benefit both technologies. We focus on two main areas: using search engines to improve LLMs (Search4LLM) and enhancing search engine functions using LLMs (LLM4Search). For Search4LLM, we investigate how search engines can provide diverse high-quality datasets for pre-training of LLMs, how they can use the most relevant documents to help LLMs learn to answer queries more accurately, how training LLMs with Learning-To-Rank (LTR) tasks can enhance their ability to respond with greater precision, and how incorporating recent search results can make LLM-generated content more accurate and current. In terms of LLM4Search, we examine how LLMs can be used to summarize content for better indexing by search engines, improve query outcomes through optimization, enhance the ranking of search results by analyzing document relevance, and help in annotating data for learning-to-rank tasks in various learning contexts. However, this promising integration comes with its challenges, which include addressing potential biases and ethical issues in training models, managing the computational and other costs of incorporating LLMs into search services, and continuously updating LLM training with the ever-changing web content. We discuss these challenges and chart out required research directions to address them. We also discuss broader implications for service computing, such as scalability, privacy concerns, and the need to adapt search engine architectures for these advanced models.
Tool Learning with Large Language Models: A Survey
Qu, Changle, Dai, Sunhao, Wei, Xiaochi, Cai, Hengyi, Wang, Shuaiqiang, Yin, Dawei, Xu, Jun, Wen, Ji-Rong
Recently, tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems. Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization, posing barriers to entry for newcomers. This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs. In this survey, we focus on reviewing existing literature from the two primary aspects (1) why tool learning is beneficial and (2) how tool learning is implemented, enabling a comprehensive understanding of tool learning with LLMs. We first explore the "why" by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects. In terms of "how", we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow: task planning, tool selection, tool calling, and response generation. Additionally, we provide a detailed summary of existing benchmarks and evaluation methods, categorizing them according to their relevance to different stages. Finally, we discuss current challenges and outline potential future directions, aiming to inspire both researchers and industrial developers to further explore this emerging and promising area. We also maintain a GitHub repository to continually keep track of the relevant papers and resources in this rising area at \url{https://github.com/quchangle1/LLM-Tool-Survey}.
COLT: Towards Completeness-Oriented Tool Retrieval for Large Language Models
Qu, Changle, Dai, Sunhao, Wei, Xiaochi, Cai, Hengyi, Wang, Shuaiqiang, Yin, Dawei, Xu, Jun, Wen, Ji-Rong
Recently, the integration of external tools with Large Language Models (LLMs) has emerged as a promising approach to overcome the inherent constraints of their pre-training data. However, realworld applications often involve a diverse range of tools, making it infeasible to incorporate all tools directly into LLMs due to constraints on input length and response time. Therefore, to fully exploit the potential of tool-augmented LLMs, it is crucial to develop an effective tool retrieval system. Existing tool retrieval methods techniques mainly rely on semantic matching between user queries and tool descriptions, which often results in the selection of redundant tools. As a result, these methods fail to provide a complete set of diverse tools necessary for addressing the multifaceted problems encountered by LLMs. In this paper, we propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools. Specifically, we first fine-tune the PLM-based retrieval models to capture the semantic relationships between queries and tools in the semantic learning stage. Subsequently, we construct three bipartite graphs among queries, scenes, and tools and introduce a dual-view graph collaborative learning framework to capture the intricate collaborative relationships among tools during the collaborative learning stage. Extensive experiments on both the open benchmark and the newly introduced ToolLens dataset show that COLT achieves superior performance. Notably, the performance of BERT-mini (11M) with our proposed model framework outperforms BERT-large (340M), which has 30 times more parameters. Additionally, we plan to publicly release the ToolLens dataset to support further research in tool retrieval.
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
Jia, Pengyue, Liu, Yiding, Li, Xiaopeng, Zhao, Xiangyu, Wang, Yuhao, Du, Yantong, Han, Xiao, Wei, Xuetao, Wang, Shuaiqiang, Yin, Dawei
Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily confuse distant images with similar visual contents, or cannot adapt to various locations worldwide with different amounts of relevant data. To resolve these limitations, we propose G3, a novel framework based on Retrieval-Augmented Generation (RAG). In particular, G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification to optimize both retrieval and generation phases of worldwide geolocalization. During Geo-alignment, our solution jointly learns expressive multi-modal representations for images, GPS and textual descriptions, which allows us to capture location-aware semantics for retrieving nearby images for a given query. During Geo-diversification, we leverage a prompt ensembling method that is robust to inconsistent retrieval performance for different image queries. Finally, we combine both retrieved and generated GPS candidates in Geo-verification for location prediction. Experiments on two well-established datasets IM2GPS3k and YFCC4k verify the superiority of G3 compared to other state-of-the-art methods.