AITopics | Luo, Junyu

Collaborating Authors

Luo, Junyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Luo, Junyu, Zhang, Weizhi, Yuan, Ye, Zhao, Yusheng, Yang, Junwei, Gu, Yiyang, Wu, Bohan, Chen, Binqi, Qiao, Ziyue, Long, Qingqing, Tu, Rongcheng, Luo, Xiao, Ju, Wei, Xiao, Zhiping, Wang, Yifan, Xiao, Meng, Liu, Chenwu, Yuan, Jingyang, Zhang, Shichang, Jin, Yiqiao, Zhang, Fan, Wu, Xian, Zhao, Hanqing, Tao, Dacheng, Yu, Philip S., Zhang, Ming

arXiv.org Artificial IntelligenceMar-27-2025

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architectural foundations, collaboration mechanisms, and evolutionary pathways. We unify fragmented research threads by revealing fundamental connections between agent design principles and their emergent behaviors in complex environments. Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time, while also addressing evaluation methodologies, tool applications, practical challenges, and diverse application domains. By surveying the latest developments in this rapidly evolving field, we offer researchers a structured taxonomy for understanding LLM agents and identify promising directions for future research. The collection is available at https://github.com/luo-junyu/Awesome-Agent-Papers.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.2146

Country:

North America > United States (1.00)
Asia (0.93)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.87)

Add feedback

Attention Bootstrapping for Multi-Modal Test-Time Adaptation

Zhao, Yusheng, Luo, Junyu, Luo, Xiao, Huang, Jinsheng, Yuan, Jingyang, Xiao, Zhiping, Zhang, Ming

arXiv.org Artificial IntelligenceMar-3-2025

Test-time adaptation aims to adapt a well-trained model to potential distribution shifts at test time using only unlabeled test data, without access to the original training data. While previous efforts mainly focus on a single modality, test-time distribution shift in the multi-modal setting is more complex and calls for new solutions. This paper tackles the problem of multi-modal test-time adaptation by proposing a novel method named Attention Bootstrapping with Principal Entropy Minimization (ABPEM). We observe that test-time distribution shift causes misalignment across modalities, leading to a large gap between intra-modality discrepancies (measured by self-attention) and inter-modality discrepancies (measured by cross-attention). We name this the attention gap. This attention gap widens with more severe distribution shifts, hindering effective modality fusion. To mitigate this attention gap and encourage better modality fusion, we propose attention bootstrapping that promotes cross-attention with the guidance of self-attention. Moreover, to reduce the gradient noise in the commonly-used entropy minimization, we adopt principal entropy minimization, a refinement of entropy minimization that reduces gradient noise by focusing on the principal parts of entropy, excluding less reliable gradient information. Extensive experiments on the benchmarks validate the effectiveness of the proposed ABPEM in comparison with competing baselines.

adaptation, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.02221

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Yuan, Jingyang, Gao, Huazuo, Dai, Damai, Luo, Junyu, Zhao, Liang, Zhang, Zhengyan, Xie, Zhenda, Wei, Y. X., Wang, Lean, Xiao, Zhiping, Wang, Yuqing, Ruan, Chong, Zhang, Ming, Liang, Wenfeng, Zeng, Wangding

arXiv.org Artificial IntelligenceFeb-16-2025

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving efficiency while maintaining model capabilities. We present NSA, a Natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling. NSA employs a dynamic hierarchical sparse strategy, combining coarse-grained token compression with fine-grained token selection to preserve both global context awareness and local precision. Our approach advances sparse attention design with two key innovations: (1) We achieve substantial speedups through arithmetic intensity-balanced algorithm design, with implementation optimizations for modern hardware. (2) We enable end-to-end training, reducing pretraining computation without sacrificing model performance. As shown in Figure 1, experiments show the model pretrained with NSA maintains or exceeds Full Attention models across general benchmarks, long-context tasks, and instruction-based reasoning. Meanwhile, NSA achieves substantial speedups over Full Attention on 64k-length sequences across decoding, forward propagation, and backward propagation, validating its efficiency throughout the model lifecycle.

artificial intelligence, sparse attention, trainable sparse attention, (1 more...)

arXiv.org Artificial Intelligence

2502.11089

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence (0.73)

Add feedback

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Luo, Junyu, Luo, Xiao, Ding, Kaize, Yuan, Jingyang, Xiao, Zhiping, Zhang, Ming

arXiv.org Artificial IntelligenceDec-19-2024

Supervised fine-tuning (SFT) plays a crucial role in adapting large language models (LLMs) to specific domains or tasks. However, as demonstrated by empirical experiments, the collected data inevitably contains noise in practical applications, which poses significant challenges to model performance on downstream tasks. Therefore, there is an urgent need for a noise-robust SFT framework to enhance model capabilities in downstream tasks. To address this challenge, we introduce a robust SFT framework (RobustFT) that performs noise detection and relabeling on downstream task data. For noise identification, our approach employs a multi-expert collaborative system with inference-enhanced models to achieve superior noise detection. In the denoising phase, we utilize a context-enhanced strategy, which incorporates the most relevant and confident knowledge followed by careful assessment to generate reliable annotations. Additionally, we introduce an effective data selection mechanism based on response entropy, ensuring only high-quality samples are retained for fine-tuning. Extensive experiments conducted on multiple LLMs across five datasets demonstrate RobustFT's exceptional performance in noisy scenarios.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.14922

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

GALA: Graph Diffusion-based Alignment with Jigsaw for Source-free Domain Adaptation

Luo, Junyu, Gu, Yiyang, Luo, Xiao, Ju, Wei, Xiao, Zhiping, Zhao, Yusheng, Yuan, Jingyang, Zhang, Ming

arXiv.org Artificial IntelligenceOct-21-2024

Source-free domain adaptation is a crucial machine learning topic, as it contains numerous applications in the real world, particularly with respect to data privacy. Existing approaches predominantly focus on Euclidean data, such as images and videos, while the exploration of non-Euclidean graph data remains scarce. Recent graph neural network (GNN) approaches can suffer from serious performance decline due to domain shift and label scarcity in source-free adaptation scenarios. In this study, we propose a novel method named Graph Diffusion-based Alignment with Jigsaw (GALA), tailored for source-free graph domain adaptation. To achieve domain alignment, GALA employs a graph diffusion model to reconstruct source-style graphs from target data. Specifically, a score-based graph diffusion model is trained using source graphs to learn the generative source styles. Then, we introduce perturbations to target graphs via a stochastic differential equation instead of sampling from a prior, followed by the reverse process to reconstruct source-style graphs. We feed the source-style graphs into an off-the-shelf GNN and introduce class-specific thresholds with curriculum learning, which can generate accurate and unbiased pseudo-labels for target graphs. Moreover, we develop a simple yet effective graph-mixing strategy named graph jigsaw to combine confident graphs and unconfident graphs, which can enhance generalization capabilities and robustness via consistency learning. Extensive experiments on benchmark datasets validate the effectiveness of GALA.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPAMI.2024.3416372

2410.16606

Country:

Asia (0.28)
North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology > Security & Privacy (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Luo, Junyu, Luo, Xiao, Chen, Xiusi, Xiao, Zhiping, Ju, Wei, Zhang, Ming

arXiv.org Artificial IntelligenceOct-17-2024

Supervised fine-tuning (SFT) is crucial in adapting large language models (LLMs) to a specific domain or task. However, only a limited amount of labeled data is available in practical applications, which poses a severe challenge for SFT in yielding satisfactory results. Therefore, a data-efficient framework that can fully exploit labeled and unlabeled data for LLM fine-tuning is highly anticipated. Towards this end, we introduce a semi-supervised fine-tuning framework named SemiEvol for LLM adaptation from a propagate-and-select manner. For knowledge propagation, SemiEvol adopts a bi-level approach, propagating knowledge from labeled data to unlabeled data through both in-weight and in-context methods. For knowledge selection, SemiEvol incorporates a collaborative learning mechanism, selecting higher-quality pseudo-response samples. We conducted experiments using GPT-4o-mini and Llama-3.1 on seven general or domain-specific datasets, demonstrating significant improvements in model performance on target data. Furthermore, we compared SemiEvol with SFT and self-evolution methods, highlighting its practicality in hybrid data scenarios.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.14745

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Graph Pooling Benchmark: Effectiveness, Robustness and Generalizability

Wang, Pengyun, Luo, Junyu, Shen, Yanxin, Heng, Siyu, Luo, Xiao

arXiv.org Artificial IntelligenceJun-16-2024

Graph pooling has gained attention for its ability to obtain effective node and graph representations for various downstream tasks. Despite the recent surge in graph pooling approaches, there is a lack of standardized experimental settings and fair benchmarks to evaluate their performance. To address this issue, we have constructed a comprehensive benchmark that includes 15 graph pooling methods and 21 different graph datasets. This benchmark systematically assesses the performance of graph pooling methods in three dimensions, i.e., effectiveness, robustness, and generalizability. We first evaluate the performance of these graph pooling approaches across different tasks including graph classification, graph regression and node classification. Then, we investigate their performance under potential noise attacks and out-of-distribution shifts in real-world scenarios. We also involve detailed efficiency analysis and parameter analysis. Extensive experiments validate the strong capability and applicability of graph pooling approaches in various scenarios, which can provide valuable insights and guidance for deep geometric learning research. The source code of our benchmark is available at https://github.com/goose315/Graph_Pooling_Benchmark.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.09031

Country: North America > United States (0.68)

Genre: Overview (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(4 more...)

Add feedback

Towards Graph Contrastive Learning: A Survey and Beyond

Ju, Wei, Wang, Yifan, Qin, Yifang, Mao, Zhengyang, Xiao, Zhiping, Luo, Junyu, Yang, Junwei, Gu, Yiyang, Wang, Dongjie, Long, Qingqing, Yi, Siyu, Luo, Xiao, Zhang, Ming

arXiv.org Artificial IntelligenceMay-20-2024

In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to produce informative representations from unlabeled graph data, reducing the reliance on expensive labeled data. While SSL on graphs has witnessed widespread adoption, one critical component, Graph Contrastive Learning (GCL), has not been thoroughly investigated in the existing literature. Thus, this survey aims to fill this gap by offering a dedicated survey on GCL. We provide a comprehensive overview of the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and contrastive optimization objectives. Furthermore, we explore the extensions of GCL to other aspects of data-efficient graph learning, such as weakly supervised learning, transfer learning, and related scenarios. We also discuss practical applications spanning domains such as drug discovery, genomics analysis, recommender systems, and finally outline the challenges and potential future directions in this field.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.11868

Country:

Asia (0.46)
North America > United States > California (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)

Add feedback

CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning

Luo, Junyu, Wang, Xiaochen, Wang, Jiaqi, Chang, Aofei, Wang, Yaqing, Ma, Fenglong

arXiv.org Artificial IntelligenceFeb-23-2024

Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and often overlook the importance of context in clinical notes. In this paper, we propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations. Our approach, unlike existing methods, employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations. We evaluate our approach on six public ICD coding datasets and the experimental results demonstrate the effectiveness of our approach compared to state-of-the-art baselines.

icd code, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.157

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (0.99)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Recent Advances in Predictive Modeling with Electronic Health Records

Wang, Jiaqi, Luo, Junyu, Ye, Muchao, Wang, Xiaochen, Zhong, Yuan, Chang, Aofei, Huang, Guanjie, Yin, Ziyi, Xiao, Cao, Sun, Jimeng, Ma, Fenglong

arXiv.org Artificial IntelligenceFeb-1-2024

The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This survey systematically reviews recent advances in deep learning-based predictive models using EHR data. Specifically, we begin by introducing the background of EHR data and providing a mathematical definition of the predictive modeling task. We then categorize and summarize predictive deep models from multiple perspectives. Furthermore, we present benchmarks and toolkits relevant to predictive modeling in healthcare. Finally, we conclude this survey by discussing open challenges and suggesting promising directions for future research.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2402.01077

Country: North America > United States > Illinois (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback