AITopics | Tang, Jiaqi

Collaborating Authors

Tang, Jiaqi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity

Ma, Ke, Tang, Jiaqi, Guo, Bin, Dang, Fan, Liu, Sicong, Zhu, Zhui, Wu, Lei, Fang, Cheng, Chen, Ying-Cong, Yu, Zhiwen, Liu, Yunhao

arXiv.org Artificial IntelligenceMar-26-2025

Despite the growing integration of deep models into mobile terminals, the accuracy of these models declines significantly due to various deployment interferences. Test-time adaptation (TTA) has emerged to improve the performance of deep models by adapting them to unlabeled target data online. Yet, the significant memory cost, particularly in resource-constrained terminals, impedes the effective deployment of most backward-propagation-based TTA methods. To tackle memory constraints, we introduce SURGEON, a method that substantially reduces memory cost while preserving comparable accuracy improvements during fully test-time adaptation (FTTA) without relying on specific network architectures or modifications to the original training procedure. Specifically, we propose a novel dynamic activation sparsity strategy that directly prunes activations at layer-specific dynamic ratios during adaptation, allowing for flexible control of learning ability and memory cost in a data-sensitive manner. Among this, two metrics, Gradient Importance and Layer Activation Memory, are considered to determine the layer-wise pruning ratios, reflecting accuracy contribution and memory efficiency, respectively. Experimentally, our method surpasses the baselines by not only reducing memory usage but also achieving superior accuracy, delivering SOTA performance across diverse datasets, architectures, and tasks.

adaptation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.20354

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference

Xiao, Qingfa, Wang, Jiachuan, Li, Haoyang, Deng, Cheng, Tang, Jiaqi, Li, Shuangyin, Zhang, Yongqi, Wang, Jun, Chen, Lei

arXiv.org Artificial IntelligenceFeb-19-2025

Recent advances in large language models (LLMs) have showcased exceptional performance in long-context tasks, while facing significant inference efficiency challenges with limited GPU memory. Existing solutions first proposed the sliding-window approach to accumulate a set of historical \textbf{key-value} (KV) pairs for reuse, then further improvements selectively retain its subsets at each step. However, due to the sparse attention distribution across a long context, it is hard to identify and recall relevant KV pairs, as the attention is distracted by massive candidate pairs. Additionally, we found it promising to select representative tokens as probe-Query in each sliding window to effectively represent the entire context, which is an approach overlooked by existing methods. Thus, we propose \textbf{ActQKV}, a training-free, \textbf{Act}ivation-aware approach that dynamically determines probe-\textbf{Q}uery and leverages it to retrieve the relevant \textbf{KV} pairs for inference. Specifically, ActQKV monitors a token-level indicator, Activation Bias, within each context window, enabling the proper construction of probe-Query for retrieval at pre-filling stage. To accurately recall the relevant KV pairs and minimize the irrelevant ones, we design a dynamic KV cut-off mechanism guided by information density across layers at the decoding stage. Experiments on the Long-Bench and $\infty$ Benchmarks demonstrate its state-of-the-art performance with competitive inference quality and resource efficiency.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.13542

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models

Tang, Jiaqi, Yan, Yuling

arXiv.org Machine LearningJan-30-2025

Score-based generative models, which transform noise into data by learning to reverse a diffusion process, have become a cornerstone of modern generative AI. This paper contributes to establishing theoretical guarantees for the probability flow ODE, a widely used diffusion-based sampler known for its practical efficiency. While a number of prior works address its general convergence theory, it remains unclear whether the probability flow ODE sampler can adapt to the low-dimensional structures commonly present in natural image data. We demonstrate that, with accurate score function estimation, the probability flow ODE sampler achieves a convergence rate of $O(k/T)$ in total variation distance (ignoring logarithmic factors), where $k$ is the intrinsic dimension of the target distribution and $T$ is the number of iterations. This dimension-free convergence rate improves upon existing results that scale with the typically much larger ambient dimension, highlighting the ability of the probability flow ODE sampler to exploit intrinsic low-dimensional structures in the target distribution for faster sampling.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Machine Learning

2501.18863

Country:

Europe (0.28)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre:

Overview (0.67)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

AdaShadow: Responsive Test-time Model Adaptation in Non-stationary Mobile Environments

Fang, Cheng, Liu, Sicong, Zhou, Zimu, Guo, Bin, Tang, Jiaqi, Ma, Ke, Yu, Zhiwen

arXiv.org Artificial IntelligenceOct-10-2024

On-device adapting to continual, unpredictable domain shifts is essential for mobile applications like autonomous driving and augmented reality to deliver seamless user experiences in evolving environments. Test-time adaptation (TTA) emerges as a promising solution by tuning model parameters with unlabeled live data immediately before prediction. However, TTA's unique forward-backward-reforward pipeline notably increases the latency over standard inference, undermining the responsiveness in time-sensitive mobile applications. This paper presents AdaShadow, a responsive test-time adaptation framework for non-stationary mobile data distribution and resource dynamics via selective updates of adaptation-critical layers. Although the tactic is recognized in generic on-device training, TTA's unsupervised and online context presents unique challenges in estimating layer importance and latency, as well as scheduling the optimal layer update plan. AdaShadow addresses these challenges with a backpropagation-free assessor to rapidly identify critical layers, a unit-based runtime predictor to account for resource dynamics in latency estimation, and an online scheduler for prompt layer update planning. Also, AdaShadow incorporates a memory I/O-aware computation reuse scheme to further reduce latency in the reforward pass. Results show that AdaShadow achieves the best accuracy-latency balance under continual shifts. At low memory and energy costs, Adashadow provides a 2x to 3.5x speedup (ms-level) over state-of-the-art TTA methods with comparable accuracy and a 14.8% to 25.4% accuracy boost over efficient supervised methods with similar latency.

artificial intelligence, latency, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3666025.3699339

2410.08256

Country: Asia > China (0.48)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Hardware (0.95)
(4 more...)

Add feedback

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

Lu, Hao, Niu, Xuesong, Wang, Jiyao, Wang, Yin, Hu, Qingyong, Tang, Jiaqi, Zhang, Yuting, Yuan, Kaishen, Huang, Bin, Yu, Zitong, He, Dengbo, Deng, Shuiguang, Chen, Hao, Chen, Yingcong, Shan, Shiguang

arXiv.org Artificial IntelligenceApr-10-2024

Multimodal large language models (MLLMs) are designed to process and integrate information from multiple sources, such as text, speech, images, and videos. Despite its success in language understanding, it is critical to evaluate the performance of downstream tasks for better human-centric applications. This paper assesses the application of MLLMs with 5 crucial abilities for affective computing, spanning from visual affective tasks and reasoning tasks. The results show that \gpt has high accuracy in facial action unit recognition and micro-expression detection while its general facial expression recognition performance is not accurate. We also highlight the challenges of achieving fine-grained micro-expression recognition and the potential for further study and demonstrate the versatility and potential of \gpt for handling advanced tasks in emotion recognition and related fields by integrating with task-related agents for more complex tasks, such as heart rate estimation through signal processing. In conclusion, this paper provides valuable insights into the potential applications and challenges of MLLMs in human-centric computing. Our interesting examples are at https://github.com/EnVision-Research/GPT4Affectivity.

artificial intelligence, natural language, visual affective computing, (3 more...)

arXiv.org Artificial Intelligence

2403.05916

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.89)
Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

An Incremental Unified Framework for Small Defect Inspection

Tang, Jiaqi, Lu, Hao, Xu, Xiaogang, Wu, Ruizheng, Hu, Sixing, Zhang, Tong, Cheng, Tsz Wa, Ge, Ming, Chen, Ying-Cong, Tsung, Fugee

arXiv.org Artificial IntelligenceJan-24-2024

Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advantageous in object-incremental learning scenarios. Employing a state-of-the-art transformer, we introduce Object-Aware Self-Attention (OASA) to delineate distinct semantic boundaries. Semantic Compression Loss (SCL) is integrated to optimize non-primary semantic space, enhancing network adaptability for novel objects. Additionally, we prioritize retaining the features of established objects during weight updates. Demonstrating prowess in both image and pixel-level defect inspection, our approach achieves state-of-the-art performance, proving indispensable for dynamic and scalable industrial inspections. Our code will be released at \url{https://github.com/jqtangust/IUF}.

artificial intelligence, incremental unified framework, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2312.08917

Country:

Asia > China (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation

Tang, Jiaqi, Xu, Xiaogang, Hu, Sixing, Chen, Ying-Cong

arXiv.org Artificial IntelligenceJul-31-2023

Due to limited camera capacities, digital images usually have a narrower dynamic illumination range than real-world scene radiance. To resolve this problem, High Dynamic Range (HDR) reconstruction is proposed to recover the dynamic range to better represent real-world scenes. However, due to different physical imaging parameters, the tone-mapping functions between images and real radiance are highly diverse, which makes HDR reconstruction extremely challenging. Existing solutions can not explicitly clarify a corresponding relationship between the tone-mapping function and the generated HDR image, but this relationship is vital when guiding the reconstruction of HDR images. To address this problem, we propose a method to explicitly estimate the tone mapping function and its corresponding HDR image in one network. Firstly, based on the characteristics of the tone mapping function, we construct a model by a polynomial to describe the trend of the tone curve. To fit this curve, we use a learnable network to estimate the coefficients of the polynomial. This curve will be automatically adjusted according to the tone space of the Low Dynamic Range (LDR) image, and reconstruct the real HDR image. Besides, since all current datasets do not provide the corresponding relationship between the tone mapping function and the LDR image, we construct a new dataset with both synthetic and real images. Extensive experiments show that our method generalizes well under different tone-mapping functions and achieves SOTA performance.

artificial intelligence, machine learning, tone-mapping function, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA230533

2307.16426

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry: Media > Photography (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback