AITopics | Mao, Qi

Collaborating Authors

Mao, Qi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

S$^2$AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance

Li, Yuanhang, Mao, Qi, Chen, Lan, Fang, Zhen, Tian, Lei, Xiao, Xinyan, Jin, Libiao, Wu, Hua

arXiv.org Artificial IntelligenceSep-23-2024

Recent advancements in text-to-video (T2V) generation using diffusion models have garnered significant attention. However, existing T2V models primarily focus on simple scenes featuring a single object performing a single motion. Challenges arise in scenarios involving multiple objects with distinct motions, often leading to incorrect video-text alignment between subjects and their corresponding motions. To address this challenge, we propose \textbf{S$^2$AG-Vid}, a training-free inference-stage optimization method that improves the alignment of multiple objects with their corresponding motions in T2V models. S$^2$AG-Vid initially applies a spatial position-based, cross-attention (CA) constraint in the early stages of the denoising process, facilitating multiple nouns distinctly attending to the correct subject regions. To enhance the motion-subject binding, we implement a syntax-guided contrastive constraint in the subsequent denoising phase, aimed at improving the correlations between the CA maps of verbs and their corresponding nouns.Both qualitative and quantitative evaluations demonstrate that the proposed framework significantly outperforms baseline approaches, producing higher-quality videos with improved subject-motion consistency.

ca map, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2409.15259

Country:

Asia > China (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

Xue, Naifu, Mao, Qi, Wang, Zijian, Zhang, Yuan, Ma, Siwei

arXiv.org Artificial IntelligenceMar-6-2024

Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivated by the capabilities of predictive language models for lossless compression, this paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization, alongside a multi-stage transformer designed to exploit spatial contextual information for modeling the prior distribution. As such, the dual-purpose framework effectively utilizes the learned prior for entropy estimation and assists in the regeneration of lost tokens. Extensive experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception, particularly in ultra-low bitrate scenarios (<=0.03 bpp), pioneering a new direction in generative compression.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2403.03736

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

Mao, Qi, Wang, Chongyu, Wang, Meng, Wang, Shiqi, Chen, Ruijie, Jin, Libiao, Ma, Siwei

arXiv.org Artificial IntelligenceDec-25-2023

The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigantic scale, which shall be effectively represented to satisfy both human and machine requirements. In this work, we investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers, supporting machine intelligence and human visual perception in a progressive fashion. With the aim of achieving efficient compression, we propose the layer-wise scalable entropy transformer to reduce the redundancy between layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio. We validate the proposed paradigm's feasibility in face image compression. Extensive qualitative and quantitative experimental results demonstrate the superiority of the proposed paradigm over the latest compression standard Versatile Video Coding (VVC) in terms of both machine analysis as well as human perception at extremely low bitrates ($<0.01$ bpp), offering new insights for human-machine collaborative compression.

artificial intelligence, machine learning, style vector, (17 more...)

arXiv.org Artificial Intelligence

2312.15622

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Mao, Qi, Chen, Lan, Gu, Yuchao, Fang, Zhen, Shou, Mike Zheng

arXiv.org Artificial IntelligenceDec-21-2023

However, localized editing in complex the other hand, mask-free methods that utilize attention injection scenarios has not been well-studied in the literature, despite mechanisms such as Prompt-to-Prompt (P2P) [10] its growing real-world demands. Existing mask-based and Plug-and-Play (PnP) [28] can preserve the original image's inpainting methods fall short of retaining the underlying structure and layout. Nevertheless, they struggle to structure within the edit region. Meanwhile, mask-free precisely align the local editing region with the intended attention-based methods often exhibit editing leakage and text in intricate scenarios, largely due to their reliance on the misalignment in more complex compositions. In this work, text prompts' localization capabilities. As a result, editing we develop MAG-Edit, a training-free, inference-stage optimization effects often extend beyond the intended area and impact method, which enables localized image editing in incorrect regions, as shown in the fourth column of Figure 1.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.11396

Country: Asia (0.14)

Genre: Research Report (0.64)

Industry: Media > Photography (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Latent Smooth Skeleton Embedding

Wang, Li (University of Illinois at Chicago) | Mao, Qi (HERE Company) | Tsang, Ivor W. (University of Technoloy Sydney)

AAAI ConferencesFeb-14-2017

Learning a smooth skeleton in a low-dimensional space from noisy data becomes important in computer vision and computational biology. Existing methods assume that the manifold constructed from the data is smooth, but they lack the ability to model skeleton structures from noisy data. To overcome this issue, we propose a novel probabilistic structured learning model to learn the density of latent embedding given high-dimensional data and its neighborhood graph. The embedded points that form a smooth skeleton structure are obtained by maximum a posteriori (MAP) estimation. Our analysis shows that the resulting similarity matrix is sparse and unique, and its associated kernel has eigenvalues that follow a power law distribution, which leads to the embeddings of a smooth skeleton. The model is extended to learn a sparse similarity matrix when the graph structure is unknown. Extensive experiments demonstrate the effectiveness of the proposed methods on various datasets by comparing them with existing methods.

constraint, oncology, optimization problem, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Illinois (0.14)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

A Novel Regularized Principal Graph Learning Framework on Explicit Graph Representation

Mao, Qi, Wang, Li, Tsang, Ivor W., Sun, Yijun

arXiv.org Machine LearningJan-17-2016

Many scientific datasets are of high dimension, and the analysis usually requires visual manipulation by retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are not self-intersected, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of convergence guarantee and adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a new regularized principal graph learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected $\ell_1$ graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly.

graph, oncology, optimization problem, (20 more...)

arXiv.org Machine Learning

1512.02752

Country:

North America > United States > New York (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A Split-Merge Framework for Comparing Clusterings

Xiang, Qiaoliang, Mao, Qi, Chai, Kian Ming, Chieu, Hai Leong, Tsang, Ivor, Zhao, Zhendong

arXiv.org Machine LearningSep-4-2012

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.

artificial intelligence, machine learning, subcomponent, (15 more...)

arXiv.org Machine Learning

1206.6475

Country:

Europe > United Kingdom > Scotland (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Parameter-Free Spectral Kernel Learning

Mao, Qi, Tsang, Ivor W.

arXiv.org Machine LearningMar-15-2012

Due to the growing ubiquity of unlabeled data, learning with unlabeled data is attracting increasing attention in machine learning. In this paper, we propose a novel semi-supervised kernel learning method which can seamlessly combine manifold structure of unlabeled data and Regularized Least-Squares (RLS) to learn a new kernel. Interestingly, the new kernel matrix can be obtained analytically with the use of spectral decomposition of graph Laplacian matrix. Hence, the proposed algorithm does not require any numerical optimization solvers. Moreover, by maximizing kernel target alignment on labeled data, we can also learn model parameters automatically with a closed-form solution. For a given graph Laplacian matrix, our proposed method does not need to tune any model parameter including the tradeoff parameter in RLS and the balance parameter for unlabeled data. Extensive experiments on ten benchmark datasets show that our proposed two-stage parameter-free spectral kernel learning algorithm can obtain comparable performance with fine-tuned manifold regularization methods in transductive setting, and outperform multiple kernel learning in supervised setting.

artificial intelligence, kernel, machine learning, (14 more...)

arXiv.org Machine Learning

1203.3495

Country: Asia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.96)

Add feedback