AITopics | Zhang, Wayne

Collaborating Authors

Zhang, Wayne

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revisiting the Integration of Convolution and Attention for Vision Backbone

Zhu, Lei, Wang, Xinjiang, Zhang, Wayne, Lau, Rynson W. H.

arXiv.org Artificial IntelligenceNov-21-2024

Convolutions (Convs) and multi-head self-attentions (MHSAs) are typically considered alternatives to each other for building vision backbones. Although some works try to integrate both, they apply the two operators simultaneously at the finest pixel granularity. With Convs responsible for per-pixel feature extraction already, the question is whether we still need to include the heavy MHSAs at such a fine-grained level. In fact, this is the root cause of the scalability issue w.r.t. the input resolution for vision transformers. To address this important problem, we propose in this work to use MSHAs and Convs in parallel \textbf{at different granularity levels} instead. Specifically, in each layer, we use two different ways to represent an image: a fine-grained regular grid and a coarse-grained set of semantic slots. We apply different operations to these two representations: Convs to the grid for local features, and MHSAs to the slots for global features. A pair of fully differentiable soft clustering and dispatching modules is introduced to bridge the grid and set representations, thus enabling local-global fusion. Through extensive experiments on various vision tasks, we empirically verify the potential of the proposed integration scheme, named \textit{GLMix}: by offloading the burden of fine-grained features to light-weight Convs, it is sufficient to use MHSAs in a few (e.g., 64) semantic slots to match the performance of recent state-of-the-art backbones, while being more efficient. Our visualization results also demonstrate that the soft clustering module produces a meaningful semantic grouping effect with only IN1k classification supervision, which may induce better interpretability and inspire new weakly-supervised semantic segmentation approaches. Code will be available at \url{https://github.com/rayleizhu/GLMix}.

artificial intelligence, machine learning, semantic slot, (16 more...)

arXiv.org Artificial Intelligence

2411.14429

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RelayAttention for Efficient Large Language Model Serving with Long System Prompts

Zhu, Lei, Wang, Xinjiang, Zhang, Wayne, Lau, Rynson W. H.

arXiv.org Artificial IntelligenceMay-30-2024

A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that involve long system prompts. Our key observation is that handling these system prompts requires heavily redundant memory accesses in existing causal attention computation algorithms. Specifically, for batched requests, the cached hidden states (\ie, key-value pairs) of system prompts are transferred from off-chip DRAM to on-chip SRAM multiple times, each corresponding to an individual request. To eliminate such a redundancy, we propose RelayAttention, an attention algorithm that allows reading these hidden states from DRAM exactly once for a batch of input tokens. RelayAttention is a free lunch: it maintains the generation quality while requiring no model retraining, as it is based on a mathematical reformulation of causal attention. We have observed significant performance improvements to a production-level system, vLLM, through integration with RelayAttention. The improvements are even more profound with longer system prompts.

large language model, machine learning, relayattention, (22 more...)

arXiv.org Artificial Intelligence

2402.14808

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Panoptic Video Scene Graph Generation

Yang, Jingkang, Peng, Wenxuan, Li, Xiangtai, Guo, Zujin, Chen, Liangyu, Li, Bo, Ma, Zheng, Zhou, Kaiyang, Zhang, Wayne, Loy, Chen Change, Liu, Ziwei

arXiv.org Artificial IntelligenceNov-28-2023

Towards building comprehensive real-world visual perception systems, we propose and study a new problem called panoptic scene graph generation (PVSG). PVSG relates to the existing video scene graph generation (VidSGG) problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos. However, the limitation of bounding boxes in detecting non-rigid objects and backgrounds often causes VidSGG to miss key details crucial for comprehensive video understanding. In contrast, PVSG requires nodes in scene graphs to be grounded by more precise, pixel-level segmentation masks, which facilitate holistic scene understanding. To advance research in this new area, we contribute the PVSG dataset, which consists of 400 videos (289 third-person + 111 egocentric videos) with a total of 150K frames labeled with panoptic segmentation masks as well as fine, temporal scene graphs. We also provide a variety of baseline methods and share useful design practices for future work.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2311.17058

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

Zhang, Jingyang, Yang, Jingkang, Wang, Pengyun, Wang, Haoqi, Lin, Yueqian, Zhang, Haoran, Sun, Yiyou, Du, Xuefeng, Zhou, Kaiyang, Zhang, Wayne, Li, Yixuan, Liu, Ziwei, Chen, Yiran, Li, Hai

arXiv.org Artificial IntelligenceJun-16-2023

Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems. Despite the emergence of an increasing number of OOD detection methods, the evaluation inconsistencies present challenges for tracking the progress in this field. OpenOOD v1 initiated the unification of the OOD detection evaluation but faced limitations in scalability and usability. In response, this paper presents OpenOOD v1.5, a significant improvement from its predecessor that ensures accurate, standardized, and user-friendly evaluation of OOD detection methodologies. Notably, OpenOOD v1.5 extends its evaluation capabilities to large-scale datasets such as ImageNet, investigates full-spectrum OOD detection which is important yet underexplored, and introduces new features including an online leaderboard and an easy-to-use evaluator. This work also contributes in-depth analysis and insights derived from comprehensive experimental results, thereby enriching the knowledge pool of OOD detection methodologies. With these enhancements, OpenOOD v1.5 aims to drive advancements and offer a more robust and comprehensive evaluation benchmark for OOD detection research.

detection, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.09301

Country:

Asia > China (0.28)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

Yang, Jingkang, Wang, Pengyun, Zou, Dejian, Zhou, Zitang, Ding, Kunyuan, Peng, Wenxuan, Wang, Haoqi, Chen, Guangyao, Li, Bo, Sun, Yiyou, Du, Xuefeng, Zhou, Kaiyang, Zhang, Wayne, Hendrycks, Dan, Li, Yixuan, Liu, Ziwei

arXiv.org Artificial IntelligenceOct-13-2022

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature. However, the field currently lacks a unified, strictly formulated, and comprehensive benchmark, which often results in unfair comparisons and inconclusive results. From the problem setting perspective, OOD detection is closely related to neighboring fields including anomaly detection (AD), open set recognition (OSR), and model uncertainty, since methods developed for one domain are often applicable to each other. To help the community to improve the evaluation and advance, we build a unified, well-structured codebase called OpenOOD, which implements over 30 methods developed in relevant fields and provides a comprehensive benchmark under the recently proposed generalized OOD detection framework. With a comprehensive comparison of these methods, we are gratified that the field has progressed significantly over the past few years, where both preprocessing methods and the orthogonal post-hoc methods show strong potential. We invite readers to use our OpenOOD codebase to develop and contribute. The full experimental results are available in this table.

data mining, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.07242

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.64)

Industry: Social Sector (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

Maximum-and-Concatenation Networks

Xie, Xingyu, Kong, Hao, Wu, Jianlong, Zhang, Wayne, Liu, Guangcan, Lin, Zhouchen

arXiv.org Machine LearningJul-9-2020

While successful in many fields, deep neural networks (DNNs) still suffer from some open problems such as bad local minima and unsatisfactory generalization performance. In this work, we propose a novel architecture called Maximum-and-Concatenation Networks (MCN) to try eliminating bad local minima and improving generalization ability as well. Remarkably, we prove that MCN has a very nice property; that is, \emph{every local minimum of an $(l+1)$-layer MCN can be better than, at least as good as, the global minima of the network consisting of its first $l$ layers}. In other words, by increasing the network depth, MCN can autonomously improve its local minima's goodness, what is more, \emph{it is easy to plug MCN into an existing deep model to make it also have this property}. Finally, under mild conditions, we show that MCN can approximate certain continuous functions arbitrarily well with \emph{high efficiency}; that is, the covering number of MCN is much smaller than most existing DNNs such as deep ReLU. Based on this, we further provide a tight generalization bound to guarantee the inference ability of MCN when dealing with testing samples.

deep learning, mcn, neural network, (17 more...)

arXiv.org Machine Learning

2007.0463

Country:

North America > Canada > Alberta > Census Division No. 13 > Woodlands County (0.24)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gradual Network for Single Image De-raining

Huang, Zhe, Yu, Weijiang, Zhang, Wayne, Feng, Litong, Xiao, Nong

arXiv.org Artificial IntelligenceSep-20-2019

Most advances in single image de-raining meet a key challenge, which is removing rain streaks with different scales and shapes while preserving image details. Existing single image de-raining approaches treat rain-streak removal as a process of pixel-wise regression directly. However, they are lacking in mining the balance between over-de-raining (e.g. removing texture details in rain-free regions) and under-de-raining (e.g. leaving rain streaks). In this paper, we firstly propose a coarse-to-fine network called Gradual Network (GraNet) consisting of coarse stage and fine stage for delving into single image de-raining with different granularities. Specifically, to reveal coarse-grained rain-streak characteristics (e.g. long and thick rain streaks/raindrops), we propose a coarse stage by utilizing local-global spatial dependencies via a local-global subnetwork composed of region-aware blocks. Taking the residual result (the coarse de-rained result) between the rainy image sample (i.e. the input data) and the output of coarse stage (i.e. the learnt rain mask) as input, the fine stage continues to de-rain by removing the fine-grained rain streaks (e.g. light rain streaks and water mist) to get a rain-free and well-reconstructed output image via a unified contextual merging sub-network with dense blocks and a merging block. Solid and comprehensive experiments on synthetic and real data demonstrate that our GraNet can significantly outperform the state-of-the-art methods by removing rain streaks with various densities, scales and shapes while keeping the image details of rain-free regions well-preserved.

deep learning, neural network, rain streak, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3343031.3350883

1909.09677

Country:

Asia > China (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Recovery of Future Data via Convolution Nuclear Norm Minimization

Liu, Guangcan, Zhang, Wayne

arXiv.org Artificial IntelligenceSep-6-2019

This paper is about recovering the unseen future data from a given sequence of historical samples, so called as \emph{future data recovery}---a significant problem closely related to time series forecasting. To address the problem, it is now prevalent to use deep neural networks, which are actually built upon the hypothesis that the desired evolution law can be learnt by using many observed samples to feed an overparameterized network. In practice, however, it is not always feasible to obtain a huge mass of training samples. To overcome the issue, we would suggest to consider a different methodology. Namely, we convert future data recovery into a more inclusive problem called \emph{sequential tensor completion} (STC), which is to restore a latent tensor of sequential structure from a sampling of its entries. Unlike the ordinary tensor completion problem studied in the majority of literature, STC has a distinctive setup that allows the locations of missing entries to be distributed arbitrarily, integrating seamlessly the future values of time series into the framework of missing data. Then we propose two methods to address STC, including Discrete Fourier Transform based $\ell_1$ minimization ($\mathrm{DFT}_{\ell_1}$) and Convolution Nuclear Norm Minimization (CNNM). We provide theoretical results to guarantee the recovery performance of the proposed methods. Remarkably, our theories disclose an important message; that is, under certain conditions, the unseen future values are indeed recoverable from the historical observations. Experiments on univariate time series, images and videos show encouraging results.

artificial intelligence, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1909.03889

Country:

North America > United States (0.28)
Asia > China > Jiangsu Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Quality (0.88)

Add feedback