AITopics | Tang, Jie

Collaborating Authors

Tang, Jie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding Emergent Abilities of Language Models from the Loss Perspective

Du, Zhengxiao, Zeng, Aohan, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceMar-30-2024

Recent studies have put into question the belief that emergent abilities in language models are exclusive to large models. This skepticism arises from two observations: 1) smaller models can also exhibit high performance on emergent abilities and 2) there is doubt on the discontinuous metrics used to measure these abilities. In this paper, we propose to study emergent abilities in the lens of pre-training loss, instead of model size or training compute. We demonstrate that the models with the same pre-training loss, but different model and data sizes, generate the same performance on various downstream tasks. We also discover that a model exhibits emergent abilities on certain tasks -- regardless of the continuity of metrics -- when its pre-training loss falls below a specific threshold. Before reaching this threshold, its performance remains at the level of random guessing. This inspires us to redefine emergent abilities as those that manifest in models with lower pre-training losses, highlighting that these abilities cannot be predicted by merely extrapolating the performance trends of models with higher pre-training losses.

large language model, machine learning, pre-training loss, (19 more...)

arXiv.org Artificial Intelligence

2403.15796

Country:

Europe (1.00)
North America > United States > Hawaii (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Does Negative Sampling Matter? A Review with Insights into its Theory and Applications

Yang, Zhen, Ding, Ming, Huang, Tinglin, Cen, Yukuo, Song, Junshuai, Xu, Bin, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceFeb-27-2024

Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling methods? In what fields is it applied? Addressing these questions, we propose a general framework that leverages negative sampling. Delving into the history of negative sampling, we trace the development of negative sampling through five evolutionary paths. We dissect and categorize the strategies used to select negative sample candidates, detailing global, local, mini-batch, hop, and memory-based approaches. Our review categorizes current negative sampling methods into five types: static, hard, GAN-based, Auxiliary-based, and In-batch methods, providing a clear structure for understanding negative sampling. Beyond detailed categorization, we highlight the application of negative sampling in various areas, offering insights into its practical benefits. Finally, we briefly discuss open problems and future directions for negative sampling.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.17238

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre:

Overview (0.93)
Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

PST-Bench: Tracing and Benchmarking the Source of Publications

Zhang, Fanjin, Cao, Kun, Cen, Yukuo, Yu, Jifan, Yin, Da, Tang, Jie

arXiv.org Artificial IntelligenceFeb-25-2024

Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which automatic algorithms can be developed to expand the evolutionary knowledge of science. In this paper, we study the problem of paper source tracing (PST) and construct a high-quality and ever-increasing dataset PST-Bench in computer science. Based on PST-Bench, we reveal several intriguing discoveries, such as the differing evolution patterns across various topics. An exploration of various methods underscores the hardness of PST-Bench, pinpointing potential directions on this topic. The dataset and codes have been available at https://github.com/THUDM/paper-source-trace.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.16009

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
(2 more...)

Add feedback

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments

Gu, Yu, Shu, Yiheng, Yu, Hao, Liu, Xiao, Dong, Yuxiao, Tang, Jie, Srinivasa, Jayanth, Latapie, Hugo, Su, Yu

arXiv.org Artificial IntelligenceFeb-22-2024

The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist language agents capable of operating within complex real-world environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, this paper investigates the intriguing potential of tools to augment LLMs in handling such complexity. To this end, we design customized tools to aid in the proactive exploration within these massive environments. Such tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments -- knowledge bases (KBs) and databases -- we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with these tools, GPT-4 achieves 2.8X the performance of the best baseline in tasks requiring access to database content and 2.2X in KB tasks. Our findings illuminate the path for advancing language agents in complex real-world applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.14672

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

Qi, Ji, Ding, Ming, Wang, Weihan, Bai, Yushi, Lv, Qingsong, Hong, Wenyi, Xu, Bin, Hou, Lei, Li, Juanzi, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceFeb-6-2024

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers. However, this conclusive alignment leads models to ignore critical visual reasoning, and further result in failures on meticulous visual problems and unfaithful responses. In this paper, we propose Chain of Manipulations, a mechanism that enables VLMs to solve problems with a series of manipulations, where each manipulation refers to an operation on the visual input, either from intrinsic abilities (e.g., grounding) acquired through prior training or from imitating human-like behaviors (e.g., zoom in). This mechanism encourages VLMs to generate faithful responses with evidential visual reasoning, and permits users to trace error causes in the interpretable paths. We thus train CogCoM, a general 17B VLM with a memory-based compatible architecture endowed this reasoning mechanism. Experiments show that our model achieves the state-of-the-art performance across 8 benchmarks from 3 categories, and a limited number of training steps with the data swiftly gains a competitive performance. The code and data are publicly available at https://github.com/THUDM/CogCoM.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.04236

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

Add feedback

Towards Efficient and Exact Optimization of Language Model Alignment

Ji, Haozhe, Lu, Cheng, Niu, Yilin, Ke, Pei, Wang, Hongning, Zhu, Jun, Tang, Jie, Huang, Minlie

arXiv.org Artificial IntelligenceFeb-2-2024

The alignment of language models with human preferences is vital for their application in real-world tasks. The problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences with minimal deviation from the initial policy. While considered as a straightforward solution, reinforcement learning (RL) suffers from high variance in policy updates, which impedes efficient policy improvement. Recently, direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. Though simple to implement, DPO is derived based on the optimal policy that is not assured to be achieved in practice, which undermines its convergence to the intended solution. In this paper, we propose efficient exact optimization (EXO) of the alignment objective. We prove that EXO is guaranteed to optimize in the same direction as the RL algorithms asymptotically for arbitary parametrization of the policy, while enables efficient optimization by circumventing the complexities associated with RL algorithms. We compare our method to DPO with both theoretical and empirical analyses, and further demonstrate the advantages of our method over existing approaches on realistic human preference data.

large language model, machine learning, sft, (19 more...)

arXiv.org Artificial Intelligence

2402.00856

Country: North America > United States > Oregon (0.28)

Genre: Research Report (0.82)

Industry:

Media > Film (0.45)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Bai, Yushi, Lv, Xin, Zhang, Jiajie, He, Yuze, Qi, Ji, Hou, Lei, Tang, Jie, Dong, Yuxiao, Li, Juanzi

arXiv.org Artificial IntelligenceJan-31-2024

Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign -- a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30\%, while also maintaining their proficiency in handling short, generic tasks. The code, data, and long-aligned models are open-sourced at https://github.com/THUDM/LongAlign.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.18058

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RecDCL: Dual Contrastive Learning for Recommendation

Zhang, Dan, Geng, Yangliao, Gong, Wenwen, Qi, Zhongang, Chen, Zhiyu, Tang, Xing, Shan, Ying, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceJan-28-2024

Self-supervised recommendation (SSR) has achieved great success in mining the potential interacted behaviors for collaborative filtering in recent years. As a major branch, Contrastive Learning (CL) based SSR conquers data sparsity in Web platforms by contrasting the embedding between raw data and augmented data. However, existing CL-based SSR methods mostly focus on contrasting in a batch-wise way, failing to exploit potential regularity in the feature-wise dimension, leading to redundant solutions during the representation learning process of users (items) from Websites. Furthermore, the joint benefits of utilizing both Batch-wise CL (BCL) and Feature-wise CL (FCL) for recommendations remain underexplored. To address these issues, we investigate the relationship of objectives between BCL and FCL. Our study suggests a cooperative benefit of employing both methods, as evidenced from theoretical and experimental perspectives. Based on these insights, we propose a dual CL method for recommendation, referred to as RecDCL. RecDCL first eliminates redundant solutions on user-item positive pairs in a feature-wise manner. It then optimizes the uniform distributions within users and items using a polynomial kernel from an FCL perspective. Finally, it generates contrastive embedding on output vectors in a batch-wise objective. We conduct experiments on four widely-used benchmarks and an industrial dataset. The results consistently demonstrate that the proposed RecDCL outperforms the state-of-the-art GNNs-based and SSL-based models (with up to a 5.65\% improvement in terms of Recall@20), thereby confirming the effectiveness of the joint-wise objective. All source codes used in this paper are publicly available at \url{https://github.com/THUDM/RecDCL}}.

artificial intelligence, machine learning, recdcl, (15 more...)

arXiv.org Artificial Intelligence

2401.15635

Country:

North America > United States (0.14)
Asia > China (0.14)
Asia > Taiwan (0.14)

Genre:

Research Report > Experimental Study (0.87)
Research Report > Promising Solution (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning

Zhang, Dan, Hu, Ziniu, Zhoubian, Sining, Du, Zhengxiao, Yang, Kaiyu, Wang, Zihan, Yue, Yisong, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceJan-15-2024

\label{sec:abstract} Large Language Models (LLMs) have shown promise in assisting scientific discovery. However, such applications are currently limited by LLMs' deficiencies in understanding intricate scientific concepts, deriving symbolic equations, and solving advanced numerical calculations. To bridge these gaps, we introduce SciGLM, a suite of scientific language models able to conduct college-level scientific reasoning. Central to our approach is a novel self-reflective instruction annotation framework to address the data scarcity challenge in the science domain. This framework leverages existing LLMs to generate step-by-step reasoning for unlabelled scientific questions, followed by a process of self-reflective critic-and-revise. Applying this framework, we curated SciInstruct, a diverse and high-quality dataset encompassing mathematics, physics, chemistry, and formal proofs. We fine-tuned the ChatGLM family of language models with SciInstruct, enhancing their capabilities in scientific and mathematical reasoning. Remarkably, SciGLM consistently improves both the base model (ChatGLM3-6B-Base) and larger-scale models (12B and 32B), without sacrificing the language understanding capabilities of the base model. This makes SciGLM a suitable foundational model to facilitate diverse scientific discovery tasks. For the benefit of the wider research community, we release SciInstruct, SciGLM, alongside a self-reflective framework and fine-tuning code at \url{https://github.com/THUDM/SciGLM}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.0795

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding

Liu, Mingdao, Zeng, Aohan, Wang, Bowen, Zhang, Peng, Tang, Jie, Dong, Yuxiao

arXiv.org Artificial IntelligenceJan-12-2024

The massive adoption of large language models (LLMs) demands efficient deployment strategies. However, the auto-regressive decoding process, which is fundamental to how most LLMs generate text, poses challenges to achieve efficient serving. In this work, we introduce a parallel auto-regressive generation method. By instruct-tuning on general domain data that contains hierarchical structures, we enable LLMs to independently plan their generation process and perform auto-parallel auto-regressive (APAR) generation, significantly reducing the number of generation steps. APAR alone can achieve up to 2x speed-up, and when combined with speculative decoding, the speed-up can reach up to 4x. In addition, APAR reduces the key-value cache consumption and attention computation during generation. This leads to a throughput increase of 20-70% and a latency reduce of 20-35% in high-throughput scenarios, compared to state-of-the-art serving frameworks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2401.06761

Country: North America > United States (0.31)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback