AITopics | Wang, Chen

Collaborating Authors

Wang, Chen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization

Du, Yi, Zhao, Zhipeng, Su, Shaoshu, Golluri, Sharath, Zheng, Haoze, Yao, Runmao, Wang, Chen

arXiv.org Artificial IntelligenceMar-21-2025

Point cloud (PC) processing tasks-such as completion, upsampling, denoising, and colorization-are crucial in applications like autonomous driving and 3D reconstruction. Despite substantial advancements, prior approaches often address each of these tasks independently, with separate models focused on individual issues. However, this isolated approach fails to account for the fact that defects like incompleteness, low resolution, noise, and lack of color frequently coexist, with each defect influencing and correlating with the others. Simply applying these models sequentially can lead to error accumulation from each model, along with increased computational costs. To address these challenges, we introduce SuperPC, the first unified diffusion model capable of concurrently handling all four tasks. Our approach employs a three-level-conditioned diffusion framework, enhanced by a novel spatial-mix-fusion strategy, to leverage the correlations among these four defects for simultaneous, efficient processing. We show that SuperPC outperforms the state-of-the-art specialized models as well as their combination on all four individual tasks.

artificial intelligence, machine learning, point cloud, (17 more...)

arXiv.org Artificial Intelligence

2503.14558

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.48)
Transportation (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

Recasens, Pol G., Agullo, Ferran, Zhu, Yue, Wang, Chen, Lee, Eun Kyung, Tardieu, Olivier, Torres, Jordi, Berral, Josep Ll.

arXiv.org Artificial IntelligenceMar-11-2025

Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource utilization during inference. While batching is commonly used to increase throughput, performance gains plateau beyond a certain batch size, especially with smaller models, a phenomenon that existing literature typically explains as a shift to the compute-bound regime. In this paper, through an in-depth GPU-level analysis, we reveal that large-batch inference remains memory-bound, with most GPU compute capabilities underutilized due to DRAM bandwidth saturation as the primary bottleneck. To address this, we propose a Batching Configuration Advisor (BCA) that optimizes memory allocation, reducing GPU memory requirements with minimal impact on throughput. The freed memory and underutilized GPU compute capabilities can then be leveraged by concurrent workloads. Specifically, we use model replication to improve serving throughput and GPU utilization. Our findings challenge conventional assumptions about LLM inference, offering new insights and practical strategies for improving resource utilization, particularly for smaller language models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.08311

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities

Jiang, Yunfan, Zhang, Ruohan, Wong, Josiah, Wang, Chen, Ze, Yanjie, Yin, Hang, Gokmen, Cem, Song, Shuran, Wu, Jiajun, Fei-Fei, Li

arXiv.org Artificial IntelligenceMar-7-2025

Real-world household tasks present significant challenges for mobile manipulation robots. An analysis of existing robotics benchmarks reveals that successful task performance hinges on three key whole-body control capabilities: bimanual coordination, stable and precise navigation, and extensive end-effector reachability. Achieving these capabilities requires careful hardware design, but the resulting system complexity further complicates visuomotor policy learning. To address these challenges, we introduce the BEHAVIOR Robot Suite (BRS), a comprehensive framework for whole-body manipulation in diverse household tasks. Built on a bimanual, wheeled robot with a 4-DoF torso, BRS integrates a cost-effective whole-body teleoperation interface for data collection and a novel algorithm for learning whole-body visuomotor policies. We evaluate BRS on five challenging household tasks that not only emphasize the three core capabilities but also introduce additional complexities, such as long-range navigation, interaction with articulated and deformable objects, and manipulation in confined spaces. We believe that BRS's integrated robotic embodiment, data collection interface, and learning framework mark a significant step toward enabling real-world whole-body manipulation for everyday household tasks. BRS is open-sourced at https://behavior-robot-suite.github.io/

artificial intelligence, available, conference, (14 more...)

arXiv.org Artificial Intelligence

2503.05652

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.65)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.46)

Add feedback

Implicit Cross-Lingual Rewarding for Efficient Multilingual Preference Alignment

Yang, Wen, Wu, Junhong, Wang, Chen, Zong, Chengqing, Zhang, Jiajun

arXiv.org Artificial IntelligenceMar-6-2025

Direct Preference Optimization (DPO) has become a prominent method for aligning Large Language Models (LLMs) with human preferences. While DPO has enabled significant progress in aligning English LLMs, multilingual preference alignment is hampered by data scarcity. To address this, we propose a novel approach that $\textit{captures}$ learned preferences from well-aligned English models by implicit rewards and $\textit{transfers}$ them to other languages through iterative training. Specifically, we derive an implicit reward model from the logits of an English DPO-aligned model and its corresponding reference model. This reward model is then leveraged to annotate preference relations in cross-lingual instruction-following pairs, using English instructions to evaluate multilingual responses. The annotated data is subsequently used for multilingual DPO fine-tuning, facilitating preference knowledge transfer from English to other languages. Fine-tuning Llama3 for two iterations resulted in a 12.72% average improvement in Win Rate and a 5.97% increase in Length Control Win Rate across all training languages on the X-AlpacaEval leaderboard. Our findings demonstrate that leveraging existing English-aligned models can enable efficient and effective multilingual preference alignment, significantly reducing the need for extensive multilingual preference data. The code is available at https://github.com/ZNLP/Implicit-Cross-Lingual-Rewarding

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.04647

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking LLMs for Political Science: A United Nations Perspective

Liang, Yueqing, Yang, Liangwei, Wang, Chen, Xia, Congying, Meng, Rui, Xu, Xiongxiao, Wang, Haoran, Payani, Ali, Shu, Kai

arXiv.org Artificial IntelligenceFeb-19-2025

Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process--drafting, voting, and discussing--and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. This work contributes to the growing intersection of AI and political science, opening new avenues for research and practical applications in global governance. The UNBench Repository can be accessed at: https://github.com/yueqingliang1/UNBench.

large language model, machine learning, resolution, (20 more...)

arXiv.org Artificial Intelligence

2502.14122

Country:

Africa (1.00)
Asia (0.71)
Europe (0.71)
North America > United States > Illinois (0.14)

Genre: Research Report (0.50)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Gu, Jihao, Wang, Yingyao, Bu, Pi, Wang, Chen, Wang, Ziming, Song, Tengtao, Wei, Donglai, Yuan, Jiale, Zhao, Yingxiu, He, Yancheng, Li, Shilong, Liu, Jiaheng, Cao, Meng, Song, Jun, Tan, Yingshui, Li, Xiang, Su, Wenbo, Zheng, Zhicheng, Zhu, Xiaoyong, Zheng, Bo

arXiv.org Artificial IntelligenceFeb-17-2025

The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers. Moreover, we contribute a rigorous data construction pipeline and decouple the visual factuality into two parts: seeing the world (i.e., object recognition) and discovering knowledge. This decoupling allows us to analyze the capability boundaries and execution mechanisms of LVLMs. Subsequently, we evaluate 34 advanced open-source and closed-source models, revealing critical performance gaps within this field.

artificial intelligence, chinese factuality evaluation, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.11718

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Data Science > Data Mining > Knowledge Discovery (0.40)

Add feedback

VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning

Du, Yi, Fu, Taimeng, Chen, Zhuoqun, Li, Bowen, Su, Shaoshu, Zhao, Zhipeng, Wang, Chen

arXiv.org Artificial IntelligenceFeb-10-2025

Vision-language navigation in unknown environments is crucial for mobile robots. In scenarios such as household assistance and rescue, mobile robots need to understand a human command, such as "find a person wearing black". We present a novel vision-language navigation (VL-Nav) system that integrates efficient spatial reasoning on low-power robots. Unlike prior methods that rely on a single image-level feature similarity to guide a robot, our method integrates pixel-wise vision-language features with curiosity-driven exploration. This approach enables robust navigation to human-instructed instances across diverse environments. We deploy VL-Nav on a four-wheel mobile robot and evaluate its performance through comprehensive navigation tasks in both indoor and outdoor environments, spanning different scales and semantic complexities. Remarkably, VL-Nav operates at a real-time frequency of 30 Hz with a Jetson Orin NX, highlighting its ability to conduct efficient vision-language navigation. Results show that VL-Nav achieves an overall success rate of 86.3%, outperforming previous methods by 44.15%.

artificial intelligence, navigation, spatial reasoning, (13 more...)

arXiv.org Artificial Intelligence

2502.00931

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.86)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.75)

Add feedback

Latent Swap Joint Diffusion for Long-Form Audio Generation

Dai, Yusheng, Wang, Chenxi, Li, Chang, Wang, Chen, Du, Jun, Li, Kewei, Wang, Ruoyu, Ma, Jiefeng, Sun, Lei, Gao, Jianqing

arXiv.org Artificial IntelligenceFeb-7-2025

Previous work on long-form audio generation using global-view diffusion or iterative generation demands significant training or inference costs. While recent advancements in multi-view joint diffusion for panoramic generation provide an efficient option, they struggle with spectrum generation with severe overlap distortions and high cross-view consistency costs. We initially explore this phenomenon through the connectivity inheritance of latent maps and uncover that averaging operations excessively smooth the high-frequency components of the latent map. To address these issues, we propose Swap Forward (SaFa), a frame-level latent swap framework that synchronizes multiple diffusions to produce a globally coherent long audio with more spectrum details in a forward-only manner. At its core, the bidirectional Self-Loop Latent Swap is applied between adjacent views, leveraging stepwise diffusion trajectory to adaptively enhance high-frequency components without disrupting low-frequency components. Furthermore, to ensure cross-view consistency, the unidirectional Reference-Guided Latent Swap is applied between the reference and the non-overlap regions of each subview during the early stages, providing centralized trajectory guidance. Quantitative and qualitative experiments demonstrate that SaFa significantly outperforms existing joint diffusion methods and even training-based long audio generation models. Moreover, we find that it also adapts well to panoramic generation, achieving comparable state-of-the-art performance with greater efficiency and model generalizability. Project page is available at https://swapforward.github.io/.

latent swap joint diffusion, machine learning, natural language, (11 more...)

arXiv.org Artificial Intelligence

2502.0513

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.68)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Nearly Tight Bounds for Exploration in Streaming Multi-armed Bandits with Known Optimality Gap

Karpov, Nikolai, Wang, Chen

arXiv.org Artificial IntelligenceFeb-3-2025

We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap $\Delta_{[2]}$. Here, and throughout, the optimality gap $\Delta_{[i]}$ is defined as the mean reward gap between the best and the $i$-th best arms. A recent line of results by Jin, Huang, Tang, and Xiao [ICML'21] and Assadi and Wang [COLT'24] have shown that if there is no known $\Delta_{[2]}$, a pass complexity of $\Theta(\log(1/\Delta_{[2]}))$ (up to $\log\log(1/\Delta_{[2]})$ terms) is necessary and sufficient to obtain the *worst-case optimal* sample complexity of $O(n/\Delta^{2}_{[2]})$ with a single-arm memory. However, our understanding of multi-pass algorithms with known $\Delta_{[2]}$ is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., $O( \sum_{i=2}^{n}1/\Delta^2_{[i]})$ arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is $\Theta(\log{n})$ passes (up to $\log\log{n}$ terms). We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of $o({n}/{\text{polylog}({n})})$ arms -- and $O(\sum_{i=2}^{n}{1}/{\Delta^{2}_{[i]}}\cdot \log{(n)})$ arm pulls has to make $\Omega(\frac{\log{n}}{\log\log{n}})$ passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of $\Delta_{[2]}$, finds the best arm with $O( \sum_{i=2}^{n}1/\Delta^2_{[i]} \cdot \log{n})$ arm pulls and a *single arm* memory.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.01067

Country:

Europe (0.92)
North America > United States > Texas (0.14)
North America > Canada > British Columbia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Solving the Content Gap in Roblox Game Recommendations: LLM-Based Profile Generation and Reranking

Wang, Chen, Wei, Xiaokai, Jiang, Yexi, Ong, Frank, Gao, Kevin, Yu, Xiao, Hui, Zheng, Yoon, Se-eun, Yu, Philip, Gong, Michelle

arXiv.org Artificial IntelligenceFeb-1-2025

With the vast and dynamic user-generated content on Roblox, creating effective game recommendations requires a deep understanding of game content. Traditional recommendation models struggle with the inconsistent and sparse nature of game text features such as titles and descriptions. Recent advancements in large language models (LLMs) offer opportunities to enhance recommendation systems by analyzing in-game text data. This paper addresses two challenges: generating high-quality, structured text features for games without extensive human annotation, and validating these features to ensure they improve recommendation relevance. We propose an approach that extracts in-game text and uses LLMs to infer attributes such as genre and gameplay objectives from raw player interactions. Additionally, we introduce an LLM-based re-ranking mechanism to assess the effectiveness of the generated text features, enhancing personalization and user satisfaction. Beyond recommendations, our approach supports applications such as user engagement-based integrity detection, already deployed in production. This scalable framework demonstrates the potential of in-game text understanding to improve recommendation quality on Roblox and adapt recommendations to its unique, user-generated ecosystem.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.06802

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Diego County (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback