AITopics | Jiang, Zhongyu

Collaborating Authors

Jiang, Zhongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Optimization for Controlled Image Editing via LLMs

Cai, Chengkun, Liu, Haoliang, Zhao, Xu, Jiang, Zhongyu, Zhang, Tianfang, Wu, Zongkai, Hwang, Jenq-Neng, Belongie, Serge, Li, Lei

arXiv.org Artificial IntelligenceFeb-26-2025

In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning. To address these challenges, we propose BayesGenie, an off-the-shelf approach that integrates Large Language Models (LLMs) with Bayesian Optimization to facilitate precise and user-friendly image editing. Our method enables users to modify images through natural language descriptions without manual area marking, while preserving the original image's semantic integrity. Unlike existing techniques that require extensive pre-training or fine-tuning, our approach demonstrates remarkable adaptability across various LLMs through its model-agnostic design. BayesGenie employs an adapted Bayesian optimization strategy to automatically refine the inference process parameters, achieving high-precision image editing with minimal user intervention. Through extensive experiments across diverse scenarios, we demonstrate that our framework significantly outperforms existing methods in both editing accuracy and semantic preservation, as validated using different LLMs including Claude3 and GPT-4.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.18116

Genre: Research Report > Experimental Study (0.46)

Industry: Media > Photography (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PackDiT: Joint Human Motion and Text Generation via Mutual Prompting

Jiang, Zhongyu, Chai, Wenhao, Zhou, Zhuoran, Yang, Cheng-Yen, Huang, Hsiang-Wei, Hwang, Jenq-Neng

arXiv.org Artificial IntelligenceJan-27-2025

Human motion generation has advanced markedly with the advent of diffusion models. Most recent studies have concentrated on generating motion sequences based on text prompts, commonly referred to as text-to-motion generation. However, the bidirectional generation of motion and text, enabling tasks such as motion-to-text alongside text-to-motion, has been largely unexplored. This capability is essential for aligning diverse modalities and supports unconditional generation. In this paper, we introduce PackDiT, the first diffusion-based generative model capable of performing various tasks simultaneously, including motion generation, motion prediction, text generation, text-to-motion, motion-to-text, and joint motion-text generation. Our core innovation leverages mutual blocks to integrate multiple diffusion transformers (DiTs) across different modalities seamlessly. We train PackDiT on the HumanML3D dataset, achieving state-of-the-art text-to-motion performance with an FID score of 0.106, along with superior results in motion prediction and in-between tasks. Our experiments further demonstrate that diffusion models are effective for motion-to-text generation, achieving performance comparable to that of autoregressive models.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.16551

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Graph Canvas for Controllable 3D Scene Generation

Liu, Libin, Chen, Shen, Jia, Sen, Shi, Jingzhe, Jiang, Zhongyu, Jin, Can, Zongkai, Wu, Hwang, Jenq-Neng, Li, Lei

arXiv.org Artificial IntelligenceDec-5-2024

Spatial intelligence is foundational to AI systems that interact with the physical world, particularly in 3D scene generation and spatial comprehension. Current methodologies for 3D scene generation often rely heavily on predefined datasets, and struggle to adapt dynamically to changing spatial relationships. In this paper, we introduce GraphCanvas3D, a programmable, extensible, and adaptable framework for controllable 3D scene generation. Leveraging in-context learning, GraphCanvas3D enables dynamic adaptability without the need for retraining, supporting flexible and customizable scene creation. Our framework employs hierarchical, graph-driven scene descriptions, representing spatial elements as graph nodes and establishing coherent relationships among objects in 3D environments. Unlike conventional approaches, which are constrained in adaptability and often require predefined input masks or retraining for modifications, GraphCanvas3D allows for seamless object manipulation and scene adjustments on the fly. Additionally, GraphCanvas3D supports 4D scene generation, incorporating temporal dynamics to model changes over time. Experimental results and user studies demonstrate that GraphCanvas3D enhances usability, flexibility, and adaptability for scene generation. Our code and models are available on the project website: https://github.com/ILGLJ/Graph-Canvas.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.00091

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Human Motion Instruction Tuning

Li, Lei, Jia, Sen, Jianhao, Wang, Jiang, Zhongyu, Zhou, Feng, Dai, Ju, Zhang, Tianfang, Zongkai, Wu, Hwang, Jenq-Neng

arXiv.org Artificial IntelligenceNov-27-2024

This paper presents LLaMo (Large Language and Human Motion Assistant), a multimodal framework for human motion instruction tuning. In contrast to conventional instruction-tuning approaches that convert non-linguistic inputs, such as video or motion sequences, into language tokens, LLaMo retains motion in its native form for instruction tuning. This method preserves motion-specific details that are often diminished in tokenization, thereby improving the model's ability to interpret complex human behaviors. By processing both video and motion data alongside textual inputs, LLaMo enables a flexible, human-centric analysis. Experimental evaluations across high-complexity domains, including human behaviors and professional activities, indicate that LLaMo effectively captures domain-specific knowledge, enhancing comprehension and prediction in motion-intensive scenarios. We hope LLaMo offers a foundation for future multimodal AI systems with broad applications, from sports analytics to behavioral prediction. Our code and models are available on the project website: https://github.com/ILGLJ/LLaMo.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.16805

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

The Role of Deductive and Inductive Reasoning in Large Language Models

Cai, Chengkun, Zhao, Xu, Liu, Haoliang, Jiang, Zhongyu, Zhang, Tianfang, Wu, Zongkai, Hwang, Jenq-Neng, Li, Lei

arXiv.org Artificial IntelligenceOct-3-2024

Large Language Models (LLMs) have achieved substantial progress in artificial intelligence, particularly in reasoning tasks. However, their reliance on static prompt structures, coupled with limited dynamic reasoning capabilities, often constrains their adaptability to complex and evolving problem spaces. In this paper, we propose the Deductive and InDuctive(DID) method, which enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning within the prompt construction process. Drawing inspiration from cognitive science, the DID approach mirrors human adaptive reasoning mechanisms, offering a flexible framework that allows the model to adjust its reasoning pathways based on task context and performance. We empirically validate the efficacy of DID on established datasets such as AIW and MR-GSM8K, as well as on our custom dataset, Holiday Puzzle, which presents tasks about different holiday date calculating challenges. By leveraging DID's hybrid prompt strategy, we demonstrate significant improvements in both solution accuracy and reasoning quality, achieved without imposing substantial computational overhead. Our findings suggest that DID provides a more robust and cognitively aligned framework for reasoning in LLMs, contributing to the development of advanced LLM-driven problem-solving strategies informed by cognitive science models. "The measure of intelligence is the ability to change."

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.02892

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

Kiefer, Benjamin, Žust, Lojze, Kristan, Matej, Perš, Janez, Teršek, Matija, Wiliem, Arnold, Messmer, Martin, Yang, Cheng-Yen, Huang, Hsiang-Wei, Jiang, Zhongyu, Kuo, Heng-Cheng, Mei, Jie, Hwang, Jenq-Neng, Stadler, Daniel, Sommer, Lars, Huang, Kaer, Zheng, Aiguo, Chong, Weitu, Lertniphonphan, Kanokphan, Xie, Jun, Chen, Feng, Li, Jian, Wang, Zhepeng, Zedda, Luca, Loddo, Andrea, Di Ruberto, Cecilia, Vu, Tuan-Anh, Nguyen-Truong, Hai, Ha, Tan-Sang, Pham, Quan-Dung, Yeung, Sai-Kit, Feng, Yuan, Thien, Nguyen Thanh, Tian, Lixin, Kuan, Sheng-Yao, Ho, Yuan-Hao, Rodriguez, Angel Bueno, Carrillo-Perez, Borja, Klein, Alexander, Alex, Antje, Steiniger, Yannik, Sattler, Felix, Solano-Carrillo, Edgardo, Fabijanić, Matej, Šumunec, Magdalena, Kapetanović, Nadir, Michel, Andreas, Gross, Wolfgang, Weinmann, Martin

arXiv.org Artificial IntelligenceNov-23-2023

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Artificial Intelligence

2311.14762

Country:

Europe > Croatia (0.28)
Europe > Germany > Baden-Württemberg (0.14)

Genre:

Overview (0.66)
Research Report (0.50)

Industry:

Information Technology (0.49)
Transportation (0.46)
Aerospace & Defense (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Efficient Domain Adaptation via Generative Prior for 3D Infant Pose Estimation

Zhou, Zhuoran, Jiang, Zhongyu, Chai, Wenhao, Yang, Cheng-Yen, Li, Lei, Hwang, Jenq-Neng

arXiv.org Artificial IntelligenceNov-17-2023

Although 3D human pose estimation has gained impressive development in recent years, only a few works focus on infants, that have different bone lengths and also have limited data. Directly applying adult pose estimation models typically achieves low performance in the infant domain and suffers from out-of-distribution issues. Moreover, the limitation of infant pose data collection also heavily constrains the efficiency of learning-based models to lift 2D poses to 3D. To deal with the issues of small datasets, domain adaptation and data augmentation are commonly used techniques. Following this paradigm, we take advantage of an optimization-based method that utilizes generative priors to predict 3D infant keypoints from 2D keypoints without the need of large training data. We further apply a guided diffusion model to domain adapt 3D adult pose to infant pose to supplement small datasets. Besides, we also prove that our method, ZeDO-i, could attain efficient domain adaptation, even if only a small number of data is given. Quantitatively, we claim that our model attains state-of-the-art MPJPE performance of 43.6 mm on the SyRIP dataset and 21.2 mm on the MINI-RGBD dataset.

artificial intelligence, machine learning, pose estimation, (17 more...)

arXiv.org Artificial Intelligence

2311.12043

Country:

Europe > Netherlands (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.87)

Add feedback

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

Jiang, Zhongyu, Zhou, Zhuoran, Li, Lei, Chai, Wenhao, Yang, Cheng-Yen, Hwang, Jenq-Neng

arXiv.org Artificial IntelligenceOct-24-2023

Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge for learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the \textbf{Ze}ro-shot \textbf{D}iffusion-based \textbf{O}ptimization (\textbf{ZeDO}) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis \textit{\textbf{ZeDO}} achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$mm, without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis \textit{\textbf{ZeDO}} achieves SOTA performance on 3DPW dataset with PA-MPJPE $40.3$mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.

artificial intelligence, human pose estimation, optimization, (1 more...)

arXiv.org Artificial Intelligence

2307.03833

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.80)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.60)

Add feedback