AITopics | Lv, Jiancheng

Collaborating Authors

Lv, Jiancheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints

Zhou, Yuhao, Tian, Yuxin, Lv, Jindi, Shi, Mingjia, Li, Yuanxi, Ye, Qing, Zhang, Shuhao, Lv, Jiancheng

arXiv.org Artificial IntelligenceMar-15-2025

In the realm of high-frequency data streams, achieving real-time learning within varying memory constraints is paramount. This paper presents Ferret, a comprehensive framework designed to enhance online accuracy of Online Continual Learning (OCL) algorithms while dynamically adapting to varying memory budgets. Ferret employs a fine-grained pipeline parallelism strategy combined with an iterative gradient compensation algorithm, ensuring seamless handling of high-frequency data with minimal latency, and effectively counteracting the challenge of stale gradients in parallel training. To adapt to varying memory budgets, its automated model partitioning and pipeline planning optimizes performance regardless of memory limitations. Extensive experiments across 20 benchmarks and 5 integrated OCL algorithms show Ferret's remarkable efficiency, achieving up to 3.7$\times$ lower memory overhead to reach the same online accuracy compared to competing methods. Furthermore, Ferret consistently outperforms these methods across diverse memory budgets, underscoring its superior adaptability. These findings position Ferret as a premier solution for efficient and adaptive OCL framework in real-time environments.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.12053

Country:

Asia > China (0.28)
North America > United States > Illinois (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Architecture > Real Time Systems (0.68)
(2 more...)

Add feedback

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

Chen, Long, Song, Xiaotian, Song, Andy, Chen, BaDong, Lv, Jiancheng, Sun, Yanan

arXiv.org Artificial IntelligenceFeb-6-2025

Spiking Large Language Models have been shown as a good alternative to LLMs in various scenarios. Existing methods for creating Spiking LLMs, i.e., direct training and ANN-SNN conversion, often suffer from performance degradation and relatively high computational costs. To address these issues, we propose a novel Fast ANN-SNN conversion strategy (FAS) that transforms LLMs into spiking LLMs in two stages. The first stage employs a full-parameter fine-tuning of pre-trained models, so it does not need any direct training from scratch. The second stage introduces a coarse-to-fine calibration method to reduce conversion errors and improve accuracy. Our experiments on both language and vision-language tasks across four different scales of LLMs demonstrate that FAS can achieve state-of-the-art performance yet with significantly reduced inference latency and computational costs. For example, FAS only takes 8 timesteps to achieve an accuracy of 3% higher than that of the OPT-7B model, while reducing energy consumption by 96.63%.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.04405

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Energy (0.69)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

E-3SFC: Communication-Efficient Federated Learning with Double-way Features Synthesizing

Zhou, Yuhao, Tian, Yuxin, Shi, Mingjia, Li, Yuanxi, Sun, Yanan, Ye, Qing, Lv, Jiancheng

arXiv.org Artificial IntelligenceFeb-5-2025

The exponential growth in model sizes has significantly increased the communication burden in Federated Learning (FL). Existing methods to alleviate this burden by transmitting compressed gradients often face high compression errors, which slow down the model's convergence. To simultaneously achieve high compression effectiveness and lower compression errors, we study the gradient compression problem from a novel perspective. Specifically, we propose a systematical algorithm termed Extended Single-Step Synthetic Features Compressing (E-3SFC), which consists of three sub-components, i.e., the Single-Step Synthetic Features Compressor (3SFC), a double-way compression algorithm, and a communication budget scheduler. First, we regard the process of gradient computation of a model as decompressing gradients from corresponding inputs, while the inverse process is considered as compressing the gradients. Based on this, we introduce a novel gradient compression method termed 3SFC, which utilizes the model itself as a decompressor, leveraging training priors such as model weights and objective functions. 3SFC compresses raw gradients into tiny synthetic features in a single-step simulation, incorporating error feedback to minimize overall compression errors. To further reduce communication overhead, 3SFC is extended to E-3SFC, allowing double-way compression and dynamic communication budget scheduling. Our theoretical analysis under both strongly convex and non-convex conditions demonstrates that 3SFC achieves linear and sub-linear convergence rates with aggregation noise. Extensive experiments across six datasets and six models reveal that 3SFC outperforms state-of-the-art methods by up to 13.4% while reducing communication costs by 111.6 times. These findings suggest that 3SFC can significantly enhance communication efficiency in FL without compromising model performance.

e-3sfc, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.03092

Country:

Asia > China > Sichuan Province (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.65)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond

Huang, Chen, Deng, Yang, Lei, Wenqiang, Lv, Jiancheng, Chua, Tat-Seng, Huang, Jimmy Xiangji

arXiv.org Artificial IntelligenceJan-10-2025

Advancements in NLP research have been greatly Given all these elements, the information propelled by large language models (LLMs), which on particular details about how to formalize an have showcased exceptional abilities (Zhao et al., effective human-model cooperation to achieve 2023; Laskar et al., 2024). These advancements are collective outputs is rather under-specified and paving the way for the development of AI models scattered. Therefore, a comprehensive and systematic that can behave as autonomous agents, working analysis of the underlying principles and alongside humans to tackle intricate tasks. These formalizations of human-model cooperation is still models, for example, can cooperate with humans absent. This gap in understanding presents a significant on data annotation (Klie et al., 2020; Li et al., opportunity for advancement, enabling us 2023a; Huang et al., 2024c), information seeking to develop a deeper understanding of the fundamental (Deng et al., 2023a; Wang et al., 2023b; Zhang basics that govern the effective cooperation et al., 2024d), creative writing (Padmakumar and between humans and intelligent models. He, 2022; Akoury et al., 2020) and real-world problem To fill this gap, in this survey, we take the first solving (Mehta et al., 2023; Feng et al., 2024; step to summarize the principles, formalizations, Qian et al., 2024).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.05714

Country:

North America > United States (0.46)
Europe > Italy (0.28)
Asia > Middle East > UAE (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Cross-model Transferability among Large Language Models on the Platonic Representations of Concepts

Huang, Youcheng, Huang, Chen, Feng, Duanyu, Lei, Wenqiang, Lv, Jiancheng

arXiv.org Artificial IntelligenceJan-2-2025

Understanding the inner workings of Large Language Models (LLMs) is a critical research frontier. Prior research has shown that a single LLM's concept representations can be captured as steering vectors (SVs), enabling the control of LLM behavior (e.g., towards generating harmful content). Our work takes a novel approach by exploring the intricate relationships between concept representations across different LLMs, drawing an intriguing parallel to Plato's Allegory of the Cave. In particular, we introduce a linear transformation method to bridge these representations and present three key findings: 1) Concept representations across different LLMs can be effectively aligned using simple linear transformations, enabling efficient cross-model transfer and behavioral control via SVs. 2) This linear transformation generalizes across concepts, facilitating alignment and control of SVs representing different concepts across LLMs. 3) A weak-to-strong transferability exists between LLM concept representations, whereby SVs extracted from smaller LLMs can effectively control the behavior of larger LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.02009

Country:

North America > United States (0.67)
Asia (0.67)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multi-view Granular-ball Contrastive Clustering

Su, Peng, Huang, Shudong, Ma, Weihong, Xiong, Deng, Lv, Jiancheng

arXiv.org Artificial IntelligenceDec-18-2024

Previous multi-view contrastive learning methods typically operate at two scales: instance-level and cluster-level. Instance-level approaches construct positive and negative pairs based on sample correspondences, aiming to bring positive pairs closer and push negative pairs further apart in the latent space. Cluster-level methods focus on calculating cluster assignments for samples under each view and maximize view consensus by reducing distribution discrepancies, e.g., minimizing KL divergence or maximizing mutual information. However, these two types of methods either introduce false negatives, leading to reduced model discriminability, or overlook local structures and cannot measure relationships between clusters across views explicitly. To this end, we propose a method named Multi-view Granular-ball Contrastive Clustering (MGBCC). MGBCC segments the sample set into coarse-grained granular balls, and establishes associations between intra-view and cross-view granular balls. These associations are reinforced in a shared latent space, thereby achieving multi-granularity contrastive learning. Granular balls lie between instances and clusters, naturally preserving the local topological structure of the sample set. We conduct extensive experiments to validate the effectiveness of the proposed method.

artificial intelligence, granular ball, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.1355

Country:

North America > United States (0.47)
Asia > China (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Locally, Revising Globally: Global Reviser for Federated Learning with Noisy Labels

Tian, Yuxin, Yang, Mouxing, Zhou, Yuhao, Wang, Jian, Ye, Qing, Liu, Tongliang, Niu, Gang, Lv, Jiancheng

arXiv.org Artificial IntelligenceNov-30-2024

The success of most federated learning (FL) methods heavily depends on label quality, which is often inaccessible in real-world scenarios, such as medicine, leading to the federated label-noise (F-LN) problem. In this study, we observe that the global model of FL memorizes the noisy labels slowly. Based on the observations, we propose a novel approach dubbed Global Reviser for Federated Learning with Noisy Labels (FedGR) to enhance the label-noise robustness of FL. In brief, FedGR employs three novel modules to achieve noisy label sniffing and refining, local knowledge revising, and local model regularization. Specifically, the global model is adopted to infer local data proxies for global sample selection and refine incorrect labels. To maximize the utilization of local knowledge, we leverage the global model to revise the local exponential moving average (EMA) model of each client and distill it into the clients' models. Additionally, we introduce a global-to-local representation regularization to mitigate the overfitting of noisy labels. Extensive experiments on three F-LNL benchmarks against seven baseline methods demonstrate the effectiveness of the proposed FedGR.

artificial intelligence, machine learning, survey article, (11 more...)

arXiv.org Artificial Intelligence

2412.00452

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Research Report > New Finding (0.34)
Overview > Innovation (0.34)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Bridging the Gap between Learning and Inference for Diffusion-Based Molecule Generation

Liu, Peidong, Zhang, Wenbo, Zhe, Xue, Lv, Jiancheng, Liu, Xianggen

arXiv.org Artificial IntelligenceNov-8-2024

Drug discovery entails a comprehensive understanding of the molecular underpinnings of disease pathophysiology, followed by the identification and synthesis of chemical entities or biopharmaceuticals capable of selectively modulating the pertinent biological pathways (Sneader, 2005). Among the numerous traditional methods, screening from natural products and serendipitous discoveries are the most renowned. The discovery of penicillin and artemisinin (White, 1997), two antibiotics, relied on the former method, while the drug repurposing of sildenafil (Eardley et al., 2002) for the treatment of erectile dysfunction owed to the latter approach. Subsequently, new biologybased and computer-assisted methods have achieved encouraging results (Mandal et al., 2009; Rognan, 2007; Batool et al., 2019). For instance, rational drug design lowers the overall cost by targeting known protein pockets, and highthroughput screening (Mayr and Bojanic, 2009) enables faster identification of molecules with potential drug activity.

diffusion model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.05472

Country: Asia > China (0.46)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

Huang, Youcheng, Zhu, Fengbin, Tang, Jingkun, Zhou, Pan, Lei, Wenqiang, Lv, Jiancheng, Chua, Tat-Seng

arXiv.org Artificial IntelligenceOct-30-2024

Visual Language Models (VLMs) are vulnerable to adversarial attacks, especially those from adversarial images, which is however under-explored in literature. To facilitate research on this critical safety problem, we first construct a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR), given that existing datasets are either small-scale or only contain limited types of harmful responses. With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input. Extensive experiments with two victim VLMs, LLaVA and MiniGPT-4, well demonstrate the effectiveness, efficiency, and cross-model transferrability of our proposed method. Our code is available at https://github.com/mob-scu/RADAR-NEARSIDE

large language model, machine learning, vlm, (22 more...)

arXiv.org Artificial Intelligence

2410.22888

Country:

Europe > Austria > Vienna (0.15)
North America > United States > Maryland (0.14)
North America > United States > Hawaii (0.14)
(2 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.50)
Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Add feedback

Dishonesty in Helpful and Harmless Alignment

Huang, Youcheng, Tang, Jingkun, Feng, Duanyu, Zhang, Zheng, Lei, Wenqiang, Lv, Jiancheng, Cohn, Anthony G.

arXiv.org Artificial IntelligenceJun-5-2024

Humans tell lies when seeking rewards. Large language models (LLMs) are aligned to human values with reinforcement learning where they get rewards if they satisfy human preference. We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses. Using the latest interpreting tools, we detect dishonesty, show how LLMs can be harmful if their honesty is increased, and analyze such phenomena at the parameter-level. Given these preliminaries and the hypothesis that reward-seeking stimulates dishonesty, we theoretically show that this dishonesty can in-turn decrease the alignment performances and augment reward-seeking alignment with representation regularization. Experimental results, including GPT-4 evaluated win-rates, perplexities, and cases studies demonstrate that we can train more honest, helpful, and harmless LLMs. We will make all our codes and results be open-sourced upon this paper's acceptance.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.01931

Country:

Asia (1.00)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Government (1.00)
Information Technology (0.93)
Law (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback