AITopics | Huang, Xiaoshui

Collaborating Authors

Huang, Xiaoshui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration

Huang, Xiaoshui, Huang, Zhou, Zuo, Yifan, Gong, Yongshun, Zhang, Chengdong, Liu, Deyang, Fang, Yuming

arXiv.org Artificial IntelligenceJan-13-2025

The discriminative feature is crucial for point cloud registration. Recent methods improve the feature discriminative by distinguishing between non-overlapping and overlapping region points. However, they still face challenges in distinguishing the ambiguous structures in the overlapping regions. Therefore, the ambiguous features they extracted resulted in a significant number of outlier matches from overlapping regions. To solve this problem, we propose a prior-guided SMoE-based registration method to improve the feature distinctiveness by dispatching the potential correspondences to the same experts. Specifically, we propose a prior-guided SMoE module by fusing prior overlap and potential correspondence embeddings for routing, assigning tokens to the most suitable experts for processing. In addition, we propose a registration framework by a specific combination of Transformer layer and prior-guided SMoE module. The proposed method not only pays attention to the importance of locating the overlapping areas of point clouds, but also commits to finding more accurate correspondences in overlapping areas. Our extensive experiments demonstrate the effectiveness of our method, achieving state-of-the-art registration recall (95.7\%/79.3\%) on the 3DMatch/3DLoMatch benchmark. Moreover, we also test the performance on ModelNet40 and demonstrate excellent performance.

artificial intelligence, machine learning, registration, (13 more...)

arXiv.org Artificial Intelligence

2501.07762

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Zhang, Di, Wu, Jianbo, Lei, Jingdi, Che, Tong, Li, Jiatong, Xie, Tong, Huang, Xiaoshui, Zhang, Shufei, Pavone, Marco, Li, Yuqiang, Ouyang, Wanli, Zhou, Dongzhan

arXiv.org Artificial IntelligenceNov-21-2024

This paper presents an advanced mathematical problem-solving framework, LLaMA-Berry, for enhancing the mathematical reasoning ability of Large Language Models (LLMs). The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critic and rewriting capabilities of LLMs, Self-Refine applied to MCTS (SR-MCTS) overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms by fostering a more efficient exploration of solution spaces. Pairwise Preference Reward Model~(PPRM), inspired by Reinforcement Learning from Human Feedback (RLHF), is then used to model pairwise preferences between solutions, utilizing an Enhanced Borda Count (EBC) method to synthesize these preferences into a global ranking score to find better answers. This approach addresses the challenges of scoring variability and non-independent distributions in mathematical reasoning tasks. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability compared to existing methods like ToT and rStar, particularly in complex Olympiad-level benchmarks, including GPQA, AIME24 and AMC23.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.02884

Country:

Asia (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Government (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Ye, Weicai, Ji, Chenhao, Chen, Zheng, Gao, Junyao, Huang, Xiaoshui, Zhang, Song-Hai, Ouyang, Wanli, He, Tong, Zhao, Cairong, Zhang, Guofeng

arXiv.org Artificial IntelligenceOct-31-2024

Diffusion-based methods have achieved remarkable achievements in 2D image or 3D object generation, however, the generation of 3D scenes and even $360^{\circ}$ images remains constrained, due to the limited number of scene datasets, the complexity of 3D scenes themselves, and the difficulty of generating consistent multi-view images. To address these issues, we first establish a large-scale panoramic video-text dataset containing millions of consecutive panoramic keyframes with corresponding panoramic depths, camera poses, and text descriptions. Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation. Specifically, benefiting from the powerful generative capabilities of stable diffusion, we fine-tune a single-view text-to-panorama diffusion model with LoRA on the established panoramic video-text dataset. We further design a spherical epipolar-aware multi-view diffusion model to ensure the multi-view consistency of the generated panoramic images. Extensive experiments demonstrate that DiffPano can generate scalable, consistent, and diverse panoramic images with given unseen text descriptions and camera poses.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.24203

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Zhang, Di, Huang, Xiaoshui, Zhou, Dongzhan, Li, Yuqiang, Ouyang, Wanli

arXiv.org Artificial IntelligenceJun-13-2024

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and Olympiad-Bench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

large language model, machine learning, node, (16 more...)

arXiv.org Artificial Intelligence

2406.07394

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Survey on 3D Content Generation

Liu, Jian, Huang, Xiaoshui, Huang, Tianyu, Chen, Lu, Hou, Yuenan, Tang, Shixiang, Liu, Ziwei, Ouyang, Wanli, Zuo, Wangmeng, Jiang, Junjun, Liu, Xianming

arXiv.org Artificial IntelligenceFeb-2-2024

Recent years have witnessed remarkable advances in artificial intelligence generated content(AIGC), with diverse input modalities, e.g., text, image, video, audio and 3D. The 3D is the most close visual modality to real-world 3D environment and carries enormous knowledge. The 3D content generation shows both academic and practical values while also presenting formidable technical challenges. This review aims to consolidate developments within the burgeoning domain of 3D content generation. Specifically, a new taxonomy is proposed that categorizes existing approaches into three types: 3D native generative methods, 2D prior-based 3D generative methods, and hybrid 3D generative methods. The survey covers approximately 60 papers spanning the major techniques. Besides, we discuss limitations of current 3D content generation techniques, and point out open challenges as well as promising directions for future work. Accompanied with this survey, we have established a project website where the resources on 3D content generation research are provided. The project page is available at https://github.com/hitcslj/Awesome-AIGC-3D.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.01166

Country: Asia > China (0.28)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models

Liu, Dingning, Huang, Xiaoshui, Hou, Yuenan, Wang, Zhihui, Yin, Zhenfei, Gong, Yongshun, Gao, Peng, Ouyang, Wanli

arXiv.org Artificial IntelligenceJan-9-2024

In this paper, we introduce Uni3D-LLM, a unified framework that leverages a Large Language Model (LLM) to integrate tasks of 3D perception, generation, and editing within point cloud scenes. This framework empowers users to effortlessly generate and modify objects at specified locations within a scene, guided by the versatility of natural language descriptions. Uni3D-LLM harnesses the expressive power of natural language to allow for precise command over the generation and editing of 3D objects, thereby significantly enhancing operational flexibility and controllability. By mapping point cloud into the unified representation space, Uni3D-LLM achieves cross-application functionality, enabling the seamless execution of a wide array of tasks, ranging from the accurate instantiation of 3D objects to the diverse requirements of interactive design. Through a comprehensive suite of rigorous experiments, the efficacy of Uni3D-LLM in the comprehension, generation, and editing of point cloud has been validated. Additionally, we have assessed the impact of integrating a point cloud perception module on the generation and editing processes, confirming the substantial potential of our approach for practical applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.03327

Country:

Oceania > Australia (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V

Liu, Dingning, Dong, Xiaomeng, Zhang, Renrui, Luo, Xu, Gao, Peng, Huang, Xiaoshui, Gong, Yongshun, Wang, Zhihui

arXiv.org Artificial IntelligenceDec-15-2023

In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks. Our investigation reveals that while GPT-4V exhibits proficiency in discerning the position and interrelations of 2D entities through current visual prompting techniques, its abilities in handling 3D spatial tasks have yet to be explored. In our approach, we create a 3D coordinate system tailored to 3D imagery, complete with annotated scale information. By presenting images infused with the 3DAP visual prompt as inputs, we empower GPT-4V to ascertain the spatial positioning information of the given 3D target image with a high degree of precision. Through experiments, We identified three tasks that could be stably completed using the 3DAP method, namely, 2D to 3D Point Reconstruction, 2D to 3D point matching, and 3D Object Detection. We perform experiments on our proposed dataset 3DAP-Data, the results from these experiments validate the efficacy of 3DAP-enhanced GPT-4V inputs, marking a significant stride in 3D spatial task execution.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.09738

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration

Wu, Yue, Yuan, Yongzhe, Fan, Xiaolong, Huang, Xiaoshui, Gong, Maoguo, Miao, Qiguang

arXiv.org Artificial IntelligenceDec-10-2023

We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the output result in a progressive way. We derive the variational bound in closed form for training and provide implementations of the model. Our work provides the following crucial findings: (i) In contrast to most existing methods, our framework, Diffusion Probabilistic Models for Point Cloud Registration (PCRDiffusion) does not require repeatedly update source point cloud to refine the predicted transformation. (ii) Point cloud registration, one of the representative discriminative tasks, can be solved by a generative way and the unified probabilistic formulation. Finally, we discuss and provide an outlook on the application of diffusion model in different scenarios for point cloud registration. Experimental results demonstrate that our model achieves competitive performance in point cloud registration. In correspondence-free and correspondence-based scenarios, PCRDifussion can both achieve exceeding 50\% performance improvements.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.06063

Country: Asia > China (0.16)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

Huang, Yongqi, Ye, Peng, Huang, Xiaoshui, Li, Sheng, Chen, Tao, He, Tong, Ouyang, Wanli

arXiv.org Artificial IntelligenceAug-25-2023

Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs), which achieves performance improvement without increasing inference cost. As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost? Recently, Mixture-of-Experts (MoE) has attracted increasing attention, as it can efficiently scale up the capacity of Transformers at a fixed cost through sparsely activated experts. Considering that MoE can also be viewed as a multi-branch structure, can we utilize MoE to implement a ViT training scheme similar to structural re-parameterization? In this paper, we affirmatively answer these questions, with a new general training strategy for ViTs. Specifically, we decouple the training and inference phases of ViTs. During training, we replace some Feed-Forward Networks (FFNs) of the ViT with specially designed, more efficient MoEs that assign tokens to experts by random uniform partition, and perform Experts Weights Averaging (EWA) on these MoEs at the end of each iteration. After training, we convert each MoE into an FFN by averaging the experts, transforming the model back into original ViT for inference. We further provide a theoretical analysis to show why and how it works. Comprehensive experiments across various 2D and 3D visual tasks, ViT architectures, and datasets validate the effectiveness and generalizability of the proposed training scheme. Besides, our training scheme can also be applied to improve performance when fine-tuning ViTs. Lastly, but equally important, the proposed EWA technique can significantly improve the effectiveness of naive MoE in various 2D visual small datasets and 3D visual tasks.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2308.06093

Country:

Europe > Netherlands (0.14)
Asia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

Mei, Guofeng, Tang, Hao, Huang, Xiaoshui, Wang, Weijie, Liu, Juan, Zhang, Jian, Van Gool, Luc, Wu, Qiang

arXiv.org Artificial IntelligenceMar-23-2023

Deep point cloud registration methods face challenges to partial overlaps and rely on labeled data. To address these issues, we propose UDPReg, an unsupervised deep probabilistic registration framework for point clouds with partial overlaps. Specifically, we first adopt a network to learn posterior probability distributions of Gaussian mixture models (GMMs) from point clouds. To handle partial point cloud registration, we apply the Sinkhorn algorithm to predict the distribution-level correspondences under the constraint of the mixing weights of GMMs. To enable unsupervised learning, we design three distribution consistency-based losses: self-consistency, cross-consistency, and local contrastive. The self-consistency loss is formulated by encouraging GMMs in Euclidean and feature spaces to share identical posterior distributions. The cross-consistency loss derives from the fact that the points of two partially overlapping point clouds belonging to the same clusters share the cluster centroids. The cross-consistency loss allows the network to flexibly learn a transformation-invariant posterior distribution of two aligned point clouds. The local contrastive loss facilitates the network to extract discriminative local features. Our UDPReg achieves competitive performance on the 3DMatch/3DLoMatch and ModelNet/ModelLoNet benchmarks.

artificial intelligence, machine learning, registration, (13 more...)

arXiv.org Artificial Intelligence

2303.1329

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback