Goto

Collaborating Authors

 Cheng, Kewei


Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

arXiv.org Artificial Intelligence

Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability. We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic reasoning and planning, and adapting to environmental feedback. Hephaestus-Forge comprises 103B agent-specific data encompassing 76,537 APIs, including both tool documentation to introduce knowledge of API functions and function calling trajectories to strengthen intrinsic reasoning. To explore effective training protocols, we investigate scaling laws to identify the optimal recipe in data mixing ratios. By continual pre-training on Hephaestus-Forge, Hephaestus outperforms small- to medium-scale open-source LLMs and rivals commercial LLMs on three agent benchmarks, demonstrating the effectiveness of our pre-training corpus in enhancing fundamental agentic capabilities and generalization of LLMs to new tasks or environments.


Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text

arXiv.org Artificial Intelligence

Although Large Language Models (LLMs) excel at addressing straightforward reasoning tasks, they frequently struggle with difficulties when confronted by more complex multi-step reasoning due to a range of factors. Firstly, natural language often encompasses complex relationships among entities, making it challenging to maintain a clear reasoning chain over longer spans. Secondly, the abundance of linguistic diversity means that the same entities and relationships can be expressed using different terminologies and structures, complicating the task of identifying and establishing connections between multiple pieces of information. Graphs provide an effective solution to represent data rich in relational information and capture long-term dependencies among entities. To harness the potential of graphs, our paper introduces Structure Guided Prompt, an innovative three-stage task-agnostic prompting framework designed to improve the multi-step reasoning capabilities of LLMs in a zero-shot setting. This framework explicitly converts unstructured text into a graph via LLMs and instructs them to navigate this graph using task-specific strategies to formulate responses. By effectively organizing information and guiding navigation, it enables LLMs to provide more accurate and context-aware responses. Our experiments show that this framework significantly enhances the reasoning capabilities of LLMs, enabling them to excel in a broader spectrum of natural language scenarios.


Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

arXiv.org Artificial Intelligence

Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.


Neural Compositional Rule Learning for Knowledge Graph Reasoning

arXiv.org Artificial Intelligence

Learning logical rules is critical to improving reasoning in KGs. This is due to their ability to provide logical and interpretable explanations when used for predictions, as well as their ability to generalize to other tasks, domains, and data. While recent methods have been proposed to learn logical rules, the majority of these methods are either restricted by their computational complexity and cannot handle the large search space of large-scale KGs, or show poor generalization when exposed to data outside the training set. In this paper, we propose an endto-end neural model for learning compositional logical rules called NCRL. By recurrently merging compositions in the rule body with a recurrent attention unit, NCRL finally predicts a single rule head. Experimental results show that NCRL learns high-quality rules, as well as being generalizable. Specifically, we show that NCRL is scalable, efficient, and yields state-of-the-art results for knowledge graph completion on large-scale KGs. Moreover, we test NCRL for systematic generalization by learning to reason on small-scale observed graphs and evaluating on larger unseen ones. Knowledge Graphs (KGs) provide a structured representation of real-world facts (Ji et al., 2021), and they are remarkably useful in various applications (Graupmann et al., 2005; Lukovnikov et al., 2017; Xiong et al., 2017; Yih et al., 2015). Since KGs are usually incomplete, KG reasoning is a crucial problem in KGs, where the goal is to infer the missing knowledge using the observed facts. This paper investigates how to learn logical rules for KG reasoning.


SecureBoost: A Lossless Federated Learning Framework

arXiv.org Machine Learning

The protection of user privacy is an important concern in machine learning, as evidenced by the rolling out of the General Data Protection Regulation (GDPR) in the European Union (EU) in May 2018. The GDPR is designed to give users more control over their personal data, which motivates us to explore machine learning frameworks with data sharing without violating user privacy. To meet this goal, in this paper, we propose a novel lossless privacy-preserving tree-boosting system known as SecureBoost in the setting of federated learning. This federated-learning system allows a learning process to be jointly conducted over multiple parties with partially common user samples but different feature sets, which corresponds to a vertically partitioned virtual data set. An advantage of SecureBoost is that it provides the same level of accuracy as the non-privacy-preserving approach while at the same time, reveal no information of each private data provider. We theoretically prove that the SecureBoost framework is as accurate as other non-federated gradient tree-boosting algorithms that bring the data into one place. In addition, along with a proof of security, we discuss what would be required to make the protocols completely secure.


Unsupervised Sentiment Analysis with Signed Social Networks

AAAI Conferences

Huge volumes of opinion-rich data is user-generated in social media at an unprecedented rate, easing the analysis of individual and public sentiments. Sentiment analysis has shown to be useful in probing and understanding emotions, expressions and attitudes in the text. However, the distinct characteristics of social media data present challenges to traditional sentiment analysis. First, social media data is often noisy, incomplete and fast-evolved which necessitates the design of a sophisticated learning model. Second, sentiment labels are hard to collect which further exacerbates the problem by not being able to discriminate sentiment polarities. Meanwhile, opportunities are also unequivocally presented. Social media contains rich sources of sentiment signals in textual terms and user interactions, which could be helpful in sentiment analysis. While there are some attempts to leverage implicit sentiment signals in positive user interactions, little attention is paid on signed social networks with both positive and negative links. The availability of signed social networks motivates us to investigate if negative links also contain useful sentiment signals. In this paper, we study a novel problem of unsupervised sentiment analysis with signed social networks. In particular, we incorporate explicit sentiment signals in textual terms and implicit sentiment signals from signed social networks into a coherent model SignedSenti for unsupervised sentiment analysis. Empirical experiments on two real-world datasets corroborate its effectiveness.