AITopics | Wang, Zhouxia

Plotting

Wang, Zhouxia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Image Conductor: Precision Control for Interactive Video Synthesis

Li, Yaowei, Wang, Xintao, Zhang, Zhaoyang, Wang, Zhouxia, Yuan, Ziyang, Xie, Liangbin, Zou, Yuexian, Shan, Ying

arXiv.org Artificial IntelligenceJun-21-2024

Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.15339

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

Wang, Zhouxia, Yuan, Ziyang, Wang, Xintao, Chen, Tianshui, Xia, Menghan, Luo, Ping, Shan, Ying

arXiv.org Artificial IntelligenceDec-6-2023

Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.03641

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry:

Media > Television (1.00)
Media > Photography (1.00)
Media > Film (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Recurrent Attentional Reinforcement Learning for Multi-Label Image Recognition

Chen, Tianshui (Sun Yat-sen University) | Wang, Zhouxia (Sun Yat-sen University) | Li, Guanbin (Sun Yat-sen University) | Lin, Liang (Sun Yat-sen University)

AAAI ConferencesFeb-8-2018

Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks. The step of hypothesis regions (region proposals) localization in these existing multi-label image recognition pipelines, however, usually takes redundant computation cost, e.g., generating hundreds of meaningless proposals with non-discriminative information and extracting their features, and the spatial contextual dependency modeling among the localized regions are often ignored or over-simplified. To resolve these issues, this paper proposes a recurrent attention reinforcement learning framework to iteratively discover a sequence of attentional and informative regions that are related to different semantic objects and further predict label scores conditioned on these regions. Besides, our method explicitly models long-term dependencies among these attentional regions that help to capture semantic label co-occurrence and thus facilitate multi-label recognition. Extensive experiments and comparisons on two large-scale benchmarks (i.e., PASCAL VOC and MS-COCO) show that our model achieves superior performance over existing state-of-the-art methods in both performance and efficiency as well as explicitly identifying image-level semantic labels to specific object regions.

deep learning, iteration, neural network, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.62)

Add feedback