AITopics | Wang, Yue

Plotting

Wang, Yue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Extrapolated Urban View Synthesis Benchmark

Han, Xiangyu, Jia, Zhen, Li, Boyi, Wang, Yan, Ivanovic, Boris, You, Yurong, Liu, Lingjie, Wang, Yue, Pavone, Marco, Feng, Chen, Li, Yiming

arXiv.org Artificial IntelligenceDec-9-2024

Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct quantitative and qualitative evaluations of state-of-the-art Gaussian Splatting methods across different difficulty levels. Our results show that Gaussian Splatting is prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We have released our data to help advance self-driving and urban robotics simulation technology.

artificial intelligence, machine learning, view synthesis, (14 more...)

arXiv.org Artificial Intelligence

2412.05256

Country:

Asia (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Wavelet Diffusion Neural Operator

Hu, Peiyan, Wang, Rui, Zheng, Xiang, Zhang, Tao, Feng, Haodong, Feng, Ruiqi, Wei, Long, Wang, Yue, Ma, Zhi-Ming, Wu, Tailin

arXiv.org Artificial IntelligenceDec-6-2024

Simulating and controlling physical systems described by partial differential equations (PDEs) are crucial tasks across science and engineering. Recently, diffusion generative models have emerged as a competitive class of methods for these tasks due to their ability to capture long-term dependencies and model high-dimensional states. However, diffusion models typically struggle with handling system states with abrupt changes and generalizing to higher resolutions. In this work, we propose Wavelet Diffusion Neural Operator (WDNO), a novel PDE simulation and control framework that enhances the handling of these complexities. WDNO comprises two key innovations. Firstly, WDNO performs diffusion-based generative modeling in the wavelet domain for the entire trajectory to handle abrupt changes and long-term dependencies effectively. Secondly, to address the issue of poor generalization across different resolutions, which is one of the fundamental tasks in modeling physical systems, we introduce multi-resolution training. We validate WDNO on five physical systems, including 1D advection equation, three challenging physical systems with abrupt changes (1D Burgers' equation, 1D compressible Navier-Stokes equation and 2D incompressible fluid), and a real-world dataset ERA5, which demonstrates superior performance on both simulation and control tasks over state-of-the-art methods, with significant improvements in long-term and detail prediction accuracy. Remarkably, in the challenging context of the 2D high-dimensional and indirect control task aimed at reducing smoke leakage, WDNO reduces the leakage by 33.2% compared to the second-best baseline.

machine learning, reinforcement learning, wavelet transform, (20 more...)

arXiv.org Artificial Intelligence

2412.04833

Country:

Europe > Sweden (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Health & Medicine (0.45)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(5 more...)

Add feedback

Multi-cam Multi-map Visual Inertial Localization: System, Validation and Dataset

Han, Fuzhang, Wei, Yufei, Jiao, Yanmei, Zhang, Zhuqing, Pan, Yiyuan, Huang, Wenjun, Tang, Li, Yin, Huan, Ding, Xiaqing, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceDec-5-2024

Map-based localization is crucial for the autonomous movement of robots as it provides real-time positional feedback. However, existing VINS and SLAM systems cannot be directly integrated into the robot's control loop. Although VINS offers high-frequency position estimates, it suffers from drift in long-term operation. And the drift-free trajectory output by SLAM is post-processed with loop correction, which is non-causal. In practical control, it is impossible to update the current pose with future information. Furthermore, existing SLAM evaluation systems measure accuracy after aligning the entire trajectory, which overlooks the transformation error between the odometry start frame and the ground truth frame. To address these issues, we propose a multi-cam multi-map visual inertial localization system, which provides real-time, causal and drift-free position feedback to the robot control loop. Additionally, we analyze the error composition of map-based localization systems and propose a set of evaluation metric suitable for measuring causal localization performance. To validate our system, we design a multi-camera IMU hardware setup and collect a long-term challenging campus dataset. Experimental results demonstrate the higher real-time localization accuracy of the proposed system. To foster community development, both the system and the dataset have been made open source https://github.com/zoeylove/Multi-cam-Multi-map-VILO/tree/main.

algorithm, artificial intelligence, dataset, (18 more...)

arXiv.org Artificial Intelligence

2412.04287

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.65)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models

Lu, Yifan, Ren, Xuanchi, Yang, Jiawei, Shen, Tianchang, Wu, Zhangjie, Gao, Jun, Wang, Yue, Chen, Siheng, Chen, Mike, Fidler, Sanja, Huang, Jiahui

arXiv.org Artificial IntelligenceDec-5-2024

Previous methods for scene generation either suffer from limited scales or lack geometric and appearance Generating simulatable and controllable 3D scenes is an essential consistency along generated sequences. In contrast, task for a wide spectrum of applications, including we leverage the recent advancements in scalable 3D mixed reality, robotics, and the training and testing of autonomous representation and video models to achieve large dynamic vehicles (AV) [25, 33]. In particular, the requirements scene generation that allows flexible controls through HD of AV applications have introduced new challenges maps, vehicle bounding boxes, and text descriptions. First, for 3D generative models in driving scenarios, posing the we construct a map-conditioned sparse-voxel-based 3D following key desiderata: (1) fidelity and consistency, to generative model to unleash its power for unbounded voxel ensure that the generated scenes support photo-realistic rendering world generation. Then, we re-purpose a video model and while preserving consistent appearance and geometry ground it on the voxel world through a set of carefully designed for reliable and stable physics simulation; (2) largescale, pixel-aligned guidance buffers, synthesizing a consistent to generate scenes at map-level for traffic simulation; appearance. Finally, we propose a fast feed-forward and (3) controllability, to allow easy manipulation of the approach that employs both voxel and pixel branches to lift scene layout, appearance, and ego-car behaviors for curating the dynamic videos to dynamic 3D Gaussians with control-adversarial scenarios.

arxiv preprint arxiv, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.03934

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for Autonomous Driving

Zhou, Hongyu, Lin, Longzhong, Wang, Jiabao, Lu, Yichong, Bai, Dongfeng, Liu, Bingbing, Wang, Yue, Geiger, Andreas, Liao, Yiyi

arXiv.org Artificial IntelligenceDec-2-2024

In the past few decades, autonomous driving algorithms have made significant progress in perception, planning, and control. However, evaluating individual components does not fully reflect the performance of entire systems, highlighting the need for more holistic assessment methods. This motivates the development of HUGSIM, a closed-loop, photo-realistic, and real-time simulator for evaluating autonomous driving algorithms. We achieve this by lifting captured 2D RGB images into the 3D space via 3D Gaussian Splatting, improving the rendering quality for closed-loop scenarios, and building the closed-loop environment. In terms of rendering, We tackle challenges of novel view synthesis in closed-loop scenarios, including viewpoint extrapolation and 360-degree vehicle rendering. Beyond novel view synthesis, HUGSIM further enables the full closed simulation loop, dynamically updating the ego and actor states and observations based on control commands. Moreover, HUGSIM offers a comprehensive benchmark across more than 70 sequences from KITTI-360, Waymo, nuScenes, and PandaSet, along with over 400 varying scenarios, providing a fair and realistic evaluation platform for existing autonomous driving algorithms. HUGSIM not only serves as an intuitive evaluation benchmark but also unlocks the potential for fine-tuning autonomous driving algorithms in a photorealistic closed-loop setting.

artificial intelligence, gaussian, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2412.01718

Country:

Asia > China (0.46)
Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (1.00)
Automobiles & Trucks (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Break the ID-Language Barrier: An Adaption Framework for Sequential Recommendation

Yu, Xiaohan, Zhang, Li, Zhao, Xin, Wang, Yue

arXiv.org Artificial IntelligenceNov-27-2024

The recent breakthrough of large language models (LLMs) in natural language processing has sparked exploration in recommendation systems, however, their limited domain-specific knowledge remains a critical bottleneck. Specifically, LLMs lack key pieces of information crucial for sequential recommendations, such as user behavior patterns. To address this critical gap, we propose IDLE-Adapter, a novel framework that integrates pre-trained ID embeddings, rich in domain-specific knowledge, into LLMs to improve recommendation accuracy. IDLE-Adapter acts as a bridge, transforming sparse user-item interaction data into dense, LLM-compatible representations through a Pre-trained ID Sequential Model, Dimensionality Alignment, Layer-wise Embedding Refinement, and Layer-wise Distribution Alignment. Furthermore, IDLE-Adapter demonstrates remarkable flexibility by seamlessly integrating ID embeddings from diverse ID-based sequential models and LLM architectures. Extensive experiments across various datasets demonstrate the superiority of IDLE-Adapter, achieving over 10\% and 20\% improvements in HitRate@5 and NDCG@5 metrics, respectively, compared to state-of-the-art methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.18262

Country:

North America > United States (0.28)
Europe (0.28)
Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AnyECG: Foundational Models for Electrocardiogram Analysis

Wang, Yue, Cao, Xu, Hu, Yaojun, Ying, Haochao, Rehg, James Matthew, Sun, Jimeng, Wu, Jian, Chen, Jintai

arXiv.org Artificial IntelligenceNov-17-2024

Electrocardiogram (ECG), a non-invasive and affordable tool for cardiac monitoring, is highly sensitive in detecting acute heart attacks. However, due to the lengthy nature of ECG recordings, numerous machine learning methods have been developed for automated heart disease detection to reduce human workload. Despite these efforts, performance remains suboptimal. A key obstacle is the inherent complexity of ECG data, which includes heterogeneity (e.g., varying sampling rates), high levels of noise, demographic-related pattern shifts, and intricate rhythm-event associations. To overcome these challenges, this paper introduces AnyECG, a foundational model designed to extract robust representations from any real-world ECG data. Specifically, a tailored ECG Tokenizer encodes each fixed-duration ECG fragment into a token and, guided by proxy tasks, converts noisy, continuous ECG features into discrete, compact, and clinically meaningful local rhythm codes. These codes encapsulate basic morphological, frequency, and demographic information (e.g., sex), effectively mitigating signal noise. We further pre-train the AnyECG to learn rhythmic pattern associations across ECG tokens, enabling the capture of cardiac event semantics. By being jointly pre-trained on diverse ECG data sources, AnyECG is capable of generalizing across a wide range of downstream tasks where ECG signals are recorded from various devices and scenarios. Experimental results in anomaly detection, arrhythmia detection, corrupted lead generation, and ultra-long ECG signal analysis demonstrate that AnyECG learns common ECG knowledge from data and significantly outperforms cutting-edge methods in each respective task.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.17711

Country:

Asia (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

BEV-ODOM: Reducing Scale Drift in Monocular Visual Odometry with BEV Representation

Wei, Yufei, Lu, Sha, Han, Fuzhang, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceNov-15-2024

Abstract-- Monocular visual odometry (MVO) is vital in autonomous navigation and robotics, providing a cost-effective and flexible motion tracking solution, but the inherent scale ambiguity in monocular setups often leads to cumulative errors over time. In this paper, we present BEV-ODOM, a novel MVO framework leveraging the Bird's Eye View (BEV) Representation to address scale drift. Unlike existing approaches, BEV-ODOM integrates a depth-based perspective-view (PV) to BEV encoder, a correlation feature extraction neck, and a CNN-MLP-based decoder, enabling it to estimate motion across three degrees of freedom without the need for depth supervision or complex optimization techniques. Our framework reduces scale drift in long-term sequences and achieves accurate motion estimation across various datasets, including NCLT, Oxford, and KITTI. In contrast, our method achieves low scale Monocular visual odometry (MVO) has been of interest drift using only pose supervision with BEV representation.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.10195

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Li, Ziming, Zang, Qianbo, Ma, David, Guo, Jiawei, Zheng, Tuney, Liu, Minghao, Niu, Xinyao, Wang, Yue, Yang, Jian, Liu, Jiaheng, Zhong, Wanjun, Zhou, Wangchunshu, Huang, Wenhao, Zhang, Ge

arXiv.org Artificial IntelligenceNov-5-2024

Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches. We propose AutoKaggle, a powerful and user-centric framework that assists data scientists in completing daily data pipelines through a collaborative multi-agent system. AutoKaggle implements an iterative development process that combines code execution, debugging, and comprehensive unit testing to ensure code correctness and logic consistency. The framework offers highly customizable workflows, allowing users to intervene at each phase, thus integrating automated intelligence with human expertise. Our universal data science toolkit, comprising validated functions for data cleaning, feature engineering, and modeling, forms the foundation of this solution, enhancing productivity by streamlining common tasks. We selected 8 Kaggle competitions to simulate data processing workflows in real-world application scenarios. Evaluation results demonstrate that AutoKaggle achieves a validation submission rate of 0.85 and a comprehensive score of 0.82 in typical data science pipelines, fully proving its effectiveness and practicality in handling complex data science tasks.

autokaggle, data quality, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.20424

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

Zhang, Fengji, Wu, Linquan, Bai, Huiyu, Lin, Guancheng, Li, Xiao, Yu, Xiao, Wang, Yue, Chen, Bei, Keung, Jacky

arXiv.org Artificial IntelligenceOct-24-2024

Coding tasks have been valuable for evaluating Large Language Models (LLMs), as they demand the comprehension of high-level instructions, complex reasoning, and the implementation of functional programs -- core capabilities for advancing Artificial General Intelligence. Despite the progress in Large Multimodal Models (LMMs), which extend LLMs with visual perception and understanding capabilities, there remains a notable lack of coding benchmarks that rigorously assess these models, particularly in tasks that emphasize visual reasoning. To address this gap, we introduce HumanEval-V, a novel and lightweight benchmark specifically designed to evaluate LMMs' visual understanding and reasoning capabilities through code generation. HumanEval-V includes 108 carefully crafted, entry-level Python coding tasks derived from platforms like CodeForces and Stack Overflow. Each task is adapted by modifying the context and algorithmic patterns of the original problems, with visual elements redrawn to ensure distinction from the source, preventing potential data leakage. LMMs are required to complete the code solution based on the provided visual context and a predefined Python function signature outlining the task requirements. Every task is equipped with meticulously handcrafted test cases to ensure a thorough and reliable evaluation of model-generated solutions. We evaluate 19 state-of-the-art LMMs using HumanEval-V, uncovering significant challenges. Proprietary models like GPT-4o achieve only 13% pass@1 and 36.4% pass@10, while open-weight models with 70B parameters score below 4% pass@1. Ablation studies further reveal the limitations of current LMMs in vision reasoning and coding capabilities. These results underscore key areas for future research to enhance LMMs' capabilities. We have open-sourced our code and benchmark at https://github.com/HumanEval-V/HumanEval-V-Benchmark.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.12381

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback