AITopics | Lin, Jun

Collaborating Authors

Lin, Jun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge

Shen, Xuan, Ma, Weize, Liu, Jing, Yang, Changdi, Ding, Rui, Wang, Quanyi, Ding, Henghui, Niu, Wei, Wang, Yanzhi, Zhao, Pu, Lin, Jun, Gu, Jiuxiang

arXiv.org Artificial IntelligenceMar-20-2025

Monocular Depth Estimation (MDE) has emerged as a pivotal task in computer vision, supporting numerous real-world applications. However, deploying accurate depth estimation models on resource-limited edge devices, especially Application-Specific Integrated Circuits (ASICs), is challenging due to the high computational and memory demands. Recent advancements in foundational depth estimation deliver impressive results but further amplify the difficulty of deployment on ASICs. To address this, we propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. To mitigate the performance degradation, we introduce activation polishing and compensation algorithm applied before and after activation quantization, as well as a weight reconstruction method for minimizing errors in weight quantization. Furthermore, we design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability, enhancing throughput and efficiency. Experimental results demonstrate that our framework achieves competitive accuracy while enabling fast inference and higher energy efficiency on ASICs, bridging the gap between high-performance depth estimation and practical edge-device applicability. Code: https://github.com/shawnricecake/quart-depth

artificial intelligence, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2503.16709

Country:

Asia (0.67)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.46)
Semiconductors & Electronics (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts

Chen, Xiangnan, Fang, Yuancheng, Xiao, Qian, Li, Juncheng, Lin, Jun, Tang, Siliang, Yang, Yi, Zhuang, Yueting

arXiv.org Artificial IntelligenceMar-7-2025

Multimodal Large Language Models (MLLMs) have garnered significant attention for their strong visual-semantic understanding. Most existing chart benchmarks evaluate MLLMs' ability to parse information from charts to answer questions. However, they overlook the inherent output biases of MLLMs, where models rely on their parametric memory to answer questions rather than genuinely understanding the chart content. To address this limitation, we introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content. Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of LLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost. Using HAI, we construct Chart-HQA, a challenging benchmark synthesized from publicly available data sources. Evaluation results on 18 MLLMs of varying model sizes reveal that current models face significant generalization challenges and exhibit imbalanced reasoning performance on the HQA task.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.04095

Country:

Asia (0.30)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

PreAdaptFWI: Pretrained-Based Adaptive Residual Learning for Full-Waveform Inversion Without Dataset Dependency

Dong, Xintong, Yuan, Zhengyi, Lin, Jun, Dong, Shiqi, Tong, Xunqian, Li, Yue

arXiv.org Artificial IntelligenceFeb-17-2025

Full-waveform inversion (FWI) is a method that utilizes seismic data to invert the physical parameters of subsurface media by minimizing the difference between simulated and observed waveforms. Due to its ill-posed nature, FWI is susceptible to getting trapped in local minima. Consequently, various research efforts have attempted to combine neural networks with FWI to stabilize the inversion process. This study presents a simple yet effective training framework that is independent of dataset reliance and requires only moderate pre-training on a simple initial model to stabilize network outputs. During the transfer learning phase, the conventional FWI gradients will simultaneously update both the neural network and the proposed adaptive residual learning module, which learns the residual mapping of large-scale distribution features in the network's output, rather than directly fitting the target mapping. Through this synergistic training paradigm, the proposed algorithm effectively infers the physically-informed prior knowledge into a global representation of stratigraphic distribution, as well as capturing subtle variations in inter-layer velocities within local details, thereby escaping local optima. Evaluating the method on two benchmark models under various conditions, including absent low-frequency data, noise interference, and differing initial models, along with corresponding ablation experiments, consistently demonstrates the superiority of the proposed approach.

artificial intelligence, inversion, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.11913

Genre: Research Report > New Finding (0.47)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator

Liu, Chengyuan, Wang, Shihang, Qing, Lizhi, Lin, Jun, Zhang, Ji, Wu, Fei, Kuang, Kun

arXiv.org Artificial IntelligenceDec-12-2024

Domain Large Language Models (LLMs) are developed for domain-specific tasks based on general LLMs. But it still requires professional knowledge to facilitate the expertise for some domain-specific tasks. In this paper, we investigate into knowledge-intensive calculation problems. We find that the math problems to be challenging for LLMs, when involving complex domain-specific rules and knowledge documents, rather than simple formulations of terminologies. Therefore, we propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator more effectively, named as KIPG. It generates knowledge-intensive programs according to the domain-specific documents. For each query, key variables are extracted, then outcomes which are dependent on domain knowledge are calculated with the programs. By iterative preference alignment, the code generator learns to improve the logic consistency with the domain knowledge. Taking legal domain as an example, we have conducted experiments to prove the effectiveness of our pipeline, and extensive analysis on the modules. We also find that the code generator is also adaptable to other domains, without training on the new knowledge.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.0928

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

LangGFM: A Large Language Model Alone Can be a Powerful Graph Foundation Model

Lin, Tianqianjin, Yan, Pengwei, Song, Kaisong, Jiang, Zhuoren, Kang, Yangyang, Lin, Jun, Yuan, Weikang, Cao, Junjie, Sun, Changlong, Liu, Xiaozhong

arXiv.org Artificial IntelligenceOct-18-2024

Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often incorporate specialized modules tailored to particular task types, losing their applicability to other graph learning tasks and contradicting the original intent of foundation models to be universal. Therefore, to enhance consistency, coverage, and diversity across domains, tasks, and research interests within the graph learning community in the evaluation of GFMs, we propose GFMBench-a systematic and comprehensive benchmark comprising 26 datasets. Moreover, we introduce LangGFM, a novel GFM that relies entirely on large language models. By revisiting and exploring the effective graph textualization principles, as well as repurposing successful techniques from graph augmentation and graph self-supervised learning within the language space, LangGFM achieves performance on par with or exceeding the state of the art across GFMBench, which can offer us new perspectives, experiences, and baselines to drive forward the evolution of GFMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.14961

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration

Yuan, Weikang, Cao, Junjie, Jiang, Zhuoren, Kang, Yangyang, Lin, Jun, Song, Kaisong, lin, tianqianjin, Yan, Pengwei, Sun, Changlong, Liu, Xiaozhong

arXiv.org Artificial IntelligenceOct-3-2024

Large Language Models (LLMs) could struggle to fully understand legal theories and perform complex legal reasoning tasks. In this study, we introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities. We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability (MALR). MALR employs non-parametric learning, encouraging LLMs to automatically decompose complex legal tasks and mimic human learning process to extract insights from legal rules, helping LLMs better understand legal theories and enhance their legal reasoning abilities. Extensive experiments on multiple real-world datasets demonstrate that the proposed framework effectively addresses complex reasoning issues in practical scenarios, paving the way for more reliable applications in the legal domain.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.02507

Country:

North America > United States (0.46)
North America > Canada (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Law > Criminal Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Music Style Transfer With Diffusion Model

Huang, Hong, Wang, Yuyi, Li, Luyao, Lin, Jun

arXiv.org Artificial IntelligenceApr-23-2024

Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited. When considering the conversion between multiple styles, previous methods required designing multiple modes to disentangle the complex style of the music, resulting in large computational costs and slow audio generation. The existing music style transfer methods generate spectrograms with artifacts, leading to significant noise in the generated audio. To address these issues, this study proposes a music style transfer framework based on diffusion models (DM) and uses spectrogram-based methods to achieve multi-to-multi music style transfer. The GuideDiff method is used to restore spectrograms to high-fidelity audio, accelerating audio generation speed and reducing noise in the generated audio. Experimental results show that our model has good performance in multi-mode music style transfer compared to the baseline and can generate high-quality audio in real-time on consumer-grade GPUs.

artificial intelligence, machine learning, style transfer, (19 more...)

arXiv.org Artificial Intelligence

2404.14771

Genre: Research Report > New Finding (0.48)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

Wang, Miaoxin, Wu, Xiao, Lin, Jun, Wang, Zhongfeng

arXiv.org Artificial IntelligenceFeb-22-2024

Convolutional neural networks (CNNs) with large kernels, drawing inspiration from the key operations of vision transformers (ViTs), have demonstrated impressive performance in various vision-based applications. To address the issue of computational efficiency degradation in existing designs for supporting large-kernel convolutions, an FPGA-based inference accelerator is proposed for the efficient deployment of CNNs with arbitrary kernel sizes. Firstly, a Z-flow method is presented to optimize the computing data flow by maximizing data reuse opportunity. Besides, the proposed design, incorporating the kernel-segmentation (Kseg) scheme, enables extended support for large-kernel convolutions, significantly reducing the storage requirements for overlapped data. Moreover, based on the analysis of typical block structures in emerging CNNs, vertical-fused (VF) and horizontal-fused (HF) methods are developed to optimize CNN deployments from both computation and transmission perspectives. The proposed hardware accelerator, evaluated on Intel Arria 10 FPGA, achieves up to 3.91 times better DSP efficiency than prior art on the same network. Particularly, it demonstrates efficient support for large-kernel CNNs, achieving throughputs of 169.68 GOPS and 244.55 GOPS for RepLKNet-31 and PyConvResNet-50, respectively, both of which are implemented on hardware for the first time.

accelerator, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.14307

Country: Asia > China (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

Chen, Xiangnan, Xiao, Qian, Li, Juncheng, Dong, Duo, Lin, Jun, Liu, Xiaozhong, Tang, Siliang

arXiv.org Artificial IntelligenceOct-27-2023

Visual Relation Extraction (VRE) is a powerful means of discovering relationships between entities within visually-rich documents. Existing methods often focus on manipulating entity features to find pairwise relations, yet neglect the more fundamental structural information that links disparate entity pairs together. The absence of global structure information may make the model struggle to learn long-range relations and easily predict conflicted results. To alleviate such limitations, we propose a GlObal Structure knowledge-guided relation Extraction (GOSE) framework. GOSE initiates by generating preliminary relation predictions on entity pairs extracted from a scanned image of the document. Subsequently, global structural knowledge is captured from the preceding iterative predictions, which are then incorporated into the representations of the entities. This "generate-capture-incorporate" cycle is repeated multiple times, allowing entity representations and global structure knowledge to be mutually reinforced. Extensive experiments validate that GOSE not only outperforms existing methods in the standard fine-tuning setting but also reveals superior cross-lingual learning capabilities; indeed, even yields stronger data-efficient performance in the low-resource setting. The code for GOSE will be available at https://github.com/chenxn2020/GOSE.

artificial intelligence, information, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.1385

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization

Pan, Kaihang, Li, Juncheng, Song, Hongye, Lin, Jun, Liu, Xiaozhong, Tang, Siliang

arXiv.org Artificial IntelligenceOct-23-2023

Prompt tuning is a parameter-efficient method, which learns soft prompts and conditions frozen language models to perform specific downstream tasks. Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts. On the other hand, it can easily overfit to few-shot training samples, thereby undermining generalizability. Existing works leverage pre-training or supervised meta-learning to initialize soft prompts but they fail to data-efficiently generalize to unseen downstream tasks. To address the above problems, this paper proposes a novel Self-sUpervised meta-Prompt learning framework with MEta-gradient Regularization for few-shot generalization (SUPMER). SUPMER leverages self-supervised meta-learning with a diverse set of well-designed meta-training tasks to learn a universal prompt initialization for efficient adaptation using only unlabeled data. Additionally, it jointly meta-learns a gradient regularization function to transform raw gradients into a domain-generalizable direction, thus alleviating the problem of overfitting. Extensive experiments show that SUPMER achieves better performance for different few-shot downstream tasks, and also exhibits a stronger domain generalization ability. The code for SUPMER will be available at https://github.com/beepkh/SUPMER.

artificial intelligence, computational linguistic, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.12314

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback