AITopics | Jin, Yilun

Collaborating Authors

Jin, Yilun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DH-RAG: A Dynamic Historical Context-Powered Retrieval-Augmented Generation Method for Multi-Turn Dialogue

Zhang, Feiyuan, Zhu, Dezhi, Ming, James, Jin, Yilun, Chai, Di, Yang, Liu, Tian, Han, Fan, Zhaoxin, Chen, Kai

arXiv.org Artificial IntelligenceFeb-19-2025

Retrieval-Augmented Generation (RAG) systems have shown substantial benefits in applications such as question answering and multi-turn dialogue [22]. However, traditional RAG methods, while leveraging static knowledge bases, often overlook the potential of dynamic historical information in ongoing conversations. To bridge this gap, we introduce DH-RAG, a Dynamic Historical Context-Powered Retrieval-Augmented Generation Method for Multi-Turn Dialogue. DH-RAG is inspired by human cognitive processes that utilize both long-term memory and immediate historical context in conversational responses [32]. DH-RAG is structured around two principal components: a History-Learning based Query Reconstruction Module, designed to generate effective queries by synthesizing current and prior interactions, and a Dynamic History Information Updating Module, which continually refreshes historical context throughout the dialogue. The center of DH-RAG is a Dynamic Historical Information database, which is further refined by three strategies within the Query Reconstruction Module: Historical Query Clustering, Hierarchical Matching, and Chain of Thought Tracking. Experimental evaluations show that DH-RAG significantly surpasses conventional models on several benchmarks, enhancing response relevance, coherence, and dialogue quality.

information, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.13847

Country:

North America > United States > California (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Token Filtering Efficiency in Large Language Model Training with Collider

Chai, Di, Li, Pengbo, Zhang, Feiyuan, Jin, Yilun, Tian, Han, Zhang, Junxue, Chen, Kai

arXiv.org Artificial IntelligenceFeb-1-2025

Token filtering has been proposed to enhance utility of large language models (LLMs) by eliminating inconsequential tokens during training. While using fewer tokens should reduce computational workloads, existing studies have not succeeded in achieving higher efficiency. This is primarily due to the insufficient sparsity caused by filtering tokens only in the output layers, as well as inefficient sparse GEMM (General Matrix Multiplication), even when having sufficient sparsity. This paper presents Collider, a system unleashing the full efficiency of token filtering in LLM training. At its core, Collider filters activations of inconsequential tokens across all layers to maintain sparsity. Additionally, it features an automatic workflow that transforms sparse GEMM into dimension-reduced dense GEMM for optimized efficiency. Evaluations on three LLMs-TinyLlama-1.1B, Qwen2.5-1.5B, and Phi1.5-1.4B-demonstrate that Collider reduces backpropagation time by up to 35.1% and end-to-end training time by up to 22.0% when filtering 40% of tokens. Utility assessments of training TinyLlama on 15B tokens indicate that Collider sustains the utility advancements of token filtering by relatively improving model utility by 16.3% comparing to regular training, and reduces training time from 4.7 days to 3.5 days using 8 GPUs. Collider is designed for easy integration into existing LLM training frameworks, allowing systems already using token filtering to accelerate training with just one line of code.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.0034

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training

Liao, Xudong, Sun, Yijun, Tian, Han, Wan, Xinchen, Jin, Yilun, Wang, Zilong, Ren, Zhenghang, Huang, Xinyang, Li, Wenxue, Tse, Kin Fai, Zhong, Zhizhen, Liu, Guyue, Zhang, Ying, Ye, Xiaofeng, Zhang, Yiming, Chen, Kai

arXiv.org Artificial IntelligenceJan-7-2025

Mixture-of-Expert (MoE) models outperform conventional models by selectively activating different subnets, named \emph{experts}, on a per-token basis. This gated computation generates dynamic communications that cannot be determined beforehand, challenging the existing GPU interconnects that remain \emph{static} during the distributed training process. In this paper, we advocate for a first-of-its-kind system, called mFabric, that unlocks topology reconfiguration \emph{during} distributed MoE training. Towards this vision, we first perform a production measurement study and show that the MoE dynamic communication pattern has \emph{strong locality}, alleviating the requirement of global reconfiguration. Based on this, we design and implement a \emph{regionally reconfigurable high-bandwidth domain} on top of existing electrical interconnects using optical circuit switching (OCS), achieving scalability while maintaining rapid adaptability. We have built a fully functional mFabric prototype with commodity hardware and a customized collective communication runtime that trains state-of-the-art MoE models with \emph{in-training} topology reconfiguration across 32 A100 GPUs. Large-scale packet-level simulations show that mFabric delivers comparable performance as the non-blocking fat-tree fabric while boosting the training cost efficiency (e.g., performance per dollar) of four representative MoE models by 1.2$\times$--1.5$\times$ and 1.9$\times$--2.3$\times$ at 100 Gbps and 400 Gbps link bandwidths, respectively.

artificial intelligence, communication, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.03905

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Telecommunications (0.47)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Jin, Yilun, Li, Zheng, Zhang, Chenwei, Cao, Tianyu, Gao, Yifan, Jayarao, Pratik, Li, Mao, Liu, Xin, Sarkhel, Ritesh, Tang, Xianfeng, Wang, Haodong, Wang, Zhengyang, Xu, Wenju, Yang, Jingfeng, Yin, Qingyu, Li, Xian, Nigam, Priyanka, Xu, Yi, Chen, Kai, Yang, Qiang, Jiang, Meng, Yin, Bing

arXiv.org Artificial IntelligenceOct-31-2024

Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shopping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at https://github.com/KL4805/ShoppingMMLU. In addition, with Shopping MMLU, we host a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website https://amazon-kddcup24.github.io/.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.20745

Country: Europe (0.67)

Genre: Research Report (1.00)

Industry:

Retail > Online (1.00)
Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PackVFL: Efficient HE Packing for Vertical Federated Learning

Yang, Liu, Cai, Shuowei, Chai, Di, Zhang, Junxue, Tian, Han, Jin, Yilun, Guo, Kun, Chen, Kai, Yang, Qiang

arXiv.org Artificial IntelligenceMay-1-2024

As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartexts into one ciphertext and supports single-instruction-multiple-data (SIMD)-style parallelism. We focus on designing a high-performant matrix multiplication (MatMult) method since it takes up most of the ciphertext computation time in HE-based VFL. Besides, devising the MatMult method is also challenging for PackedHE because a slight difference in the packing way could predominantly affect its computation and communication costs. Without domain-specific design, directly applying SOTA MatMult methods is hard to achieve optimal. Therefore, we make a three-fold design: 1) we systematically explore the current design space of MatMult and quantify the complexity of existing approaches to provide guidance; 2) we propose a hybrid MatMult method according to the unique characteristics of VFL; 3) we adaptively apply our hybrid method in representative VFL algorithms, leveraging distinctive algorithmic properties to further improve efficiency. As the batch size, feature dimension and model size of VFL scale up to large sizes, PackVFL consistently delivers enhanced performance. Empirically, PackVFL propels existing VFL algorithms to new heights, achieving up to a 51.52X end-to-end speedup. This represents a substantial 34.51X greater speedup compared to the direct application of SOTA MatMult methods.

artificial intelligence, machine learning, opération, (16 more...)

arXiv.org Artificial Intelligence

2405.00482

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

A Survey on Vertical Federated Learning: From a Layered Perspective

Yang, Liu, Chai, Di, Zhang, Junxue, Jin, Yilun, Wang, Leye, Liu, Hao, Tian, Han, Xu, Qian, Chen, Kai

arXiv.org Artificial IntelligenceApr-4-2023

Vertical federated learning (VFL) is a promising category of federated learning for the scenario where data is vertically partitioned and distributed among parties. VFL enriches the description of samples using features from different parties to improve model capacity. Compared with horizontal federated learning, in most cases, VFL is applied in the commercial cooperation scenario of companies. Therefore, VFL contains tremendous business values. In the past few years, VFL has attracted more and more attention in both academia and industry. In this paper, we systematically investigate the current work of VFL from a layered perspective. From the hardware layer to the vertical federated system layer, researchers contribute to various aspects of VFL. Moreover, the application of VFL has covered a wide range of areas, e.g., finance, healthcare, etc. At each layer, we categorize the existing work and explore the challenges for the convenience of further research and development of VFL. Especially, we design a novel MOSP tree taxonomy to analyze the core component of VFL, i.e., secure vertical federated machine learning algorithm. Our taxonomy considers four dimensions, i.e., machine learning model (M), protection object (O), security model (S), and privacy-preserving protocol (P), and provides a comprehensive investigation.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.01829

Country:

North America > United States (0.67)
Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government (0.92)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Federated Learning without Full Labels: A Survey

Jin, Yilun, Liu, Yang, Chen, Kai, Yang, Qiang

arXiv.org Artificial IntelligenceMar-25-2023

Data privacy has become an increasingly important concern in real-world big data applications such as machine learning. To address the problem, federated learning (FL) has been a promising solution to building effective machine learning models from decentralized and private data. Existing federated learning algorithms mainly tackle the supervised learning problem, where data are assumed to be fully labeled. However, in practice, fully labeled data is often hard to obtain, as the participants may not have sufficient domain expertise, or they lack the motivation and tools to label data. Therefore, the problem of federated learning without full labels is important in real-world FL applications. In this paper, we discuss how the problem can be solved with machine learning techniques that leverage unlabeled data. We present a survey of methods that combine FL with semi-supervised learning, self-supervised learning, and transfer learning methods. We also summarize the datasets used to evaluate FL methods without full labels. Finally, we highlight future directions in the context of FL without full labels.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.14453

Country: North America > United States (1.00)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

CityNet: A Multi-city Multi-modal Dataset for Smart City Applications

Geng, Xu, Jin, Yilun, Zheng, Zhengfei, Yang, Yu, Li, Yexin, Tian, Han, Duan, Peibo, Wang, Leye, Cao, Jiannong, Yang, Hai, Yang, Qiang, Chen, Kai

arXiv.org Artificial IntelligenceJun-30-2021

Data-driven approaches have been applied to many problems in urban computing. However, in the research community, such approaches are commonly studied under data from limited sources, and are thus unable to characterize the complexity of urban data coming from multiple entities and the correlations among them. Consequently, an inclusive and multifaceted dataset is necessary to facilitate more extensive studies on urban computing. In this paper, we present CityNet, a multi-modal urban dataset containing data from 7 cities, each of which coming from 3 data sources. We first present the generation process of CityNet as well as its basic properties. In addition, to facilitate the use of CityNet, we carry out extensive machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning. The experimental results not only provide benchmarks for a wide range of tasks and methods, but also uncover internal correlations among cities and tasks within CityNet that, with adequate leverage, can improve performances on various tasks. With the benchmarking results and the correlations uncovered, we believe that CityNet can contribute to the field of urban computing by supporting research on many advanced topics.

artificial intelligence, machine learning, multi-city multi-modal dataset, (4 more...)

arXiv.org Artificial Intelligence

2106.15802

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Theoretically Improving Graph Neural Networks via Anonymous Walk Graph Kernels

Long, Qingqing, Jin, Yilun, Wu, Yi, Song, Guojie

arXiv.org Artificial IntelligenceApr-7-2021

Graph neural networks (GNNs) have achieved tremendous success in graph mining. However, the inability of GNNs to model substructures in graphs remains a significant drawback. Specifically, message-passing GNNs (MPGNNs), as the prevailing type of GNNs, have been theoretically shown unable to distinguish, detect or count many graph substructures. While efforts have been paid to complement the inability, existing works either rely on pre-defined substructure sets, thus being less flexible, or are lacking in theoretical insights. In this paper, we propose GSKN, a GNN model with a theoretically stronger ability to distinguish graph structures. Specifically, we design GSKN based on anonymous walks (AWs), flexible substructure units, and derive it upon feature mappings of graph kernels (GKs). We theoretically show that GSKN provably extends the 1-WL test, and hence the maximally powerful MPGNNs from both graph-level and node-level viewpoints. Correspondingly, various experiments are leveraged to evaluate GSKN, where GSKN outperforms a wide range of baselines, endorsing the analysis.

artificial intelligence, kernel, neural network, (16 more...)

arXiv.org Artificial Intelligence

2104.02995

Country:

Asia > China (0.14)
North America > United States (0.14)
Europe > Slovenia (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Ternary Hashing

Liu, Chang, Fan, Lixin, Ng, Kam Woh, Jin, Yilun, Ju, Ce, Zhang, Tianyu, Chan, Chee Seng, Yang, Qiang

arXiv.org Artificial IntelligenceMar-19-2021

This paper proposes a novel ternary hash encoding for learning to hash methods, which provides a principled more efficient coding scheme with performances better than those of the state-of-the-art binary hashing counterparts. Two kinds of axiomatic ternary logic, Kleene logic and {\L}ukasiewicz logic are adopted to calculate the Ternary Hamming Distance (THD) for both the learning/encoding and testing/querying phases. Our work demonstrates that, with an efficient implementation of ternary logic on standard binary machines, the proposed ternary hashing is compared favorably to the binary hashing methods with consistent improvements of retrieval mean average precision (mAP) ranging from 1\% to 5.9\% as shown in CIFAR10, NUS-WIDE and ImageNet100 datasets.

artificial intelligence, logic, neural network, (17 more...)

arXiv.org Artificial Intelligence

2103.09173

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback