AITopics | Wang, Yunlong

Collaborating Authors

Wang, Yunlong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient End-to-end Visual Localization for Autonomous Driving with Decoupled BEV Neural Matching

Miao, Jinyu, Wen, Tuopu, Luo, Ziang, Qian, Kangan, Fu, Zheng, Wang, Yunlong, Jiang, Kun, Yang, Mengmeng, Huang, Jin, Zhong, Zhihua, Yang, Diange

arXiv.org Artificial IntelligenceMar-2-2025

-- Accurate localization plays an important role in high-level autonomous driving systems. Conventional map matching-based localization methods solve the poses by explicitly matching map elements with sensor observations, generally sensitive to perception noise, therefore requiring costly hyper-parameter tuning. In this paper, we propose an end-to-end localization neural network which directly estimates vehicle poses from surrounding images, without explicitly matching perception results with HD maps. T o ensure efficiency and inter-pretability, a decoupled BEV neural matching-based pose solver is proposed, which estimates poses in a differentiable sampling-based matching module. Moreover, the sampling space is hugely reduced by decoupling the feature representation affected by each DoF of poses. The experimental results demonstrate that the proposed network is capable of performing decimeter level localization with mean absolute errors of 0.19m, 0.13m and 0.39 Visual localization serves as a vital component in high-level Autonomous Driving (AD) systems due to its ability to estimate vehicle poses with an economical sensor suite. In recent decades, several works have achieved extraordinary success in terms of localization accuracy and robustness [1]. A plethora of scene maps has been developed in the domain of visual localization research, yielding varying degrees of pose estimation accuracy [1]. In conventional robotic systems, visual localization systems often employ geo-tagged frames [2], [3] and visual landmark maps [4].

artificial intelligence, localization, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.00862

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.91)
Automobiles & Trucks (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors

Qian, Kangan, Jiao, Xinyu, Shi, Yining, Wang, Yunlong, Luo, Ziang, Fu, Zheng, Jiang, Kun, Yang, Diange

arXiv.org Artificial IntelligenceDec-5-2024

Reliable perception of spatial and motion information is crucial for safe autonomous navigation. Traditional approaches typically fall into two categories: object-centric and class-agnostic methods. While object-centric methods often struggle with missed detections, leading to inaccuracies in motion prediction, many class-agnostic methods focus heavily on encoder design, often overlooking important priors like rigidity and temporal consistency, leading to suboptimal performance, particularly with sparse LiDAR data at distant region. To address these issues, we propose $\textbf{PriorMotion}$, a generative framework that extracts rasterized and vectorized scene representations to model spatio-temporal priors. Our model comprises a BEV encoder, an Raster-Vector prior Encoder, and a Spatio-Temporal prior Generator, improving both spatial and temporal consistency in motion prediction. Additionally, we introduce a standardized evaluation protocol for class-agnostic motion prediction. Experiments on the nuScenes dataset show that PriorMotion achieves state-of-the-art performance, with further validation on advanced FMCW LiDAR confirming its robustness.

machine learning, natural language, prediction, (16 more...)

arXiv.org Artificial Intelligence

2412.0402

Country: Asia (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Using Large Language Models to Assist Video Content Analysis: An Exploratory Study of Short Videos on Depression

Liu, Jiaying, Wang, Yunlong, Lyu, Yao, Su, Yiheng, Niu, Shuo, Xu, Xuhai Orson, Zhang, Yan

arXiv.org Artificial IntelligenceJul-4-2024

Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2406.19528

Country:

North America > United States (1.00)
Asia (0.68)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:

Workflow (0.91)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

6-DoF Grasp Detection in Clutter with Enhanced Receptive Field and Graspable Balance Sampling

Wang, Hanwen, Zhang, Ying, Wang, Yunlong, Li, Jian

arXiv.org Artificial IntelligenceJul-1-2024

6-DoF grasp detection of small-scale grasps is crucial for robots to perform specific tasks. This paper focuses on enhancing the recognition capability of small-scale grasping, aiming to improve the overall accuracy of grasping prediction results and the generalization ability of the network. We propose an enhanced receptive field method that includes a multi-radii cylinder grouping module and a passive attention module. This method enhances the receptive field area within the graspable space and strengthens the learning of graspable features. Additionally, we design a graspable balance sampling module based on a segmentation network, which enables the network to focus on features of small objects, thereby improving the recognition capability of small-scale grasping. Our network achieves state-of-the-art performance on the GraspNet-1Billion dataset, with an overall improvement of approximately 10% in average precision@k (AP). Furthermore, we deployed our grasp detection model in pybullet grasping platform, which validates the effectiveness of our method.

artificial intelligence, machine learning, module, (18 more...)

arXiv.org Artificial Intelligence

2407.01209

Country:

Asia > China (0.15)
North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ToolEENet: Tool Affordance 6D Pose Estimation

Wang, Yunlong, Zhang, Lei, Tu, Yuyang, Zhang, Hui, Bai, Kaixin, Chen, Zhaopeng, Zhang, Jianwei

arXiv.org Artificial IntelligenceApr-5-2024

The exploration of robotic dexterous hands utilizing tools has recently attracted considerable attention. A significant challenge in this field is the precise awareness of a tool's pose when grasped, as occlusion by the hand often degrades the quality of the estimation. Additionally, the tool's overall pose often fails to accurately represent the contact interaction, thereby limiting the effectiveness of vision-guided, contact-dependent activities. To overcome this limitation, we present the innovative TOOLEE dataset, which, to the best of our knowledge, is the first to feature affordance segmentation of a tool's end-effector (EE) along with its defined 6D pose based on its usage. Furthermore, we propose the ToolEENet framework for accurate 6D pose estimation of the tool's EE. This framework begins by segmenting the tool's EE from raw RGBD data, then uses a diffusion model-based pose estimator for 6D pose estimation at a category-specific level. Addressing the issue of symmetry in pose estimation, we introduce a symmetry-aware pose representation that enhances the consistency of pose estimation. Our approach excels in this field, demonstrating high levels of precision and generalization. Furthermore, it shows great promise for application in contact-based manipulation scenarios. All data and codes are available on the project website: https://yuyangtu.github.io/projectToolEENet.html

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.04193

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Survey on Monocular Re-Localization: From the Perspective of Scene Map Representation

Miao, Jinyu, Jiang, Kun, Wen, Tuopu, Wang, Yunlong, Jia, Peijing, Zhao, Xuhe, Cheng, Qian, Xiao, Zhongyang, Huang, Jin, Zhong, Zhihua, Yang, Diange

arXiv.org Artificial IntelligenceJan-12-2024

Monocular Re-Localization (MRL) is a critical component in autonomous applications, estimating 6 degree-of-freedom ego poses w.r.t. the scene map based on monocular images. In recent decades, significant progress has been made in the development of MRL techniques. Numerous algorithms have accomplished extraordinary success in terms of localization accuracy and robustness. In MRL, scene maps are represented in various forms, and they determine how MRL methods work and how MRL methods perform. However, to the best of our knowledge, existing surveys do not provide systematic reviews about the relationship between MRL solutions and their used scene map representation. This survey fills the gap by comprehensively reviewing MRL methods from such a perspective, promoting further research. 1) We commence by delving into the problem definition of MRL, exploring current challenges, and comparing ours with existing surveys. 2) Many well-known MRL methods are categorized and reviewed into five classes according to the representation forms of utilized map, i.e., geo-tagged frames, visual landmarks, point clouds, vectorized semantic map, and neural network-based map. 3) To quantitatively and fairly compare MRL methods with various map, we introduce some public datasets and provide the performances of some state-of-the-art MRL methods. The strengths and weakness of MRL methods with different map are analyzed. 4) We finally introduce some topics of interest in this field and give personal opinions. This survey can serve as a valuable referenced materials for MRL, and a continuously updated summary of this survey is publicly available to the community at: https://github.com/jinyummiao/map-in-mono-reloc.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2311.15643

Country:

Asia (0.45)
Europe (0.45)
North America > United States > California (0.14)

Genre: Overview (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Towards Trustworthy Explanation: On Causal Rationalization

Zhang, Wenbo, Wu, Tong, Wang, Yunlong, Cai, Yong, Cai, Hengrui

arXiv.org Machine LearningSep-8-2023

With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define a series of probabilities of causation based on a newly proposed structural causal model of rationalization, with its theoretical identification established as the main component of learning necessary and sufficient rationales. The superior performance of the proposed causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of-the-art methods.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2306.14115

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Poses as Queries: Image-to-LiDAR Map Localization with Transformers

Miao, Jinyu, Jiang, Kun, Wang, Yunlong, Wen, Tuopu, Xiao, Zhongyang, Fu, Zheng, Yang, Mengmeng, Liu, Maolin, Yang, Diange

arXiv.org Artificial IntelligenceMay-7-2023

High-precision vehicle localization with commercial setups is a crucial technique for high-level autonomous driving tasks. Localization with a monocular camera in LiDAR map is a newly emerged approach that achieves promising balance between cost and accuracy, but estimating pose by finding correspondences between such cross-modal sensor data is challenging, thereby damaging the localization accuracy. In this paper, we address the problem by proposing a novel Transformer-based neural network to register 2D images into 3D LiDAR map in an end-to-end manner. Poses are implicitly represented as high-dimensional feature vectors called pose queries and can be iteratively updated by interacting with the retrieved relevant information from cross-model features using attention mechanism in a proposed POse Estimator Transformer (POET) module. Moreover, we apply a multiple hypotheses aggregation method that estimates the final poses by performing parallel optimization on multiple randomly initialized pose queries to reduce the network uncertainty. Comprehensive analysis and experimental results on public benchmark conclude that the proposed image-to-LiDAR map localization network could achieve state-of-the-art performances in challenging cross-modal localization tasks.

artificial intelligence, localization, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.04298

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions

Wang, Yunlong, Shen, Shuyuan, Lim, Brian Y.

arXiv.org Artificial IntelligenceMar-19-2023

Generative AI models have shown impressive ability to produce images with text prompts, which could benefit creativity in visual art creation and self-expression. However, it is unclear how precisely the generated images express contexts and emotions from the input texts. We explored the emotional expressiveness of AI-generated images and developed RePrompt, an automatic method to refine text prompts toward precise expression of the generated images. Inspired by crowdsourced editing strategies, we curated intuitive text features, such as the number and concreteness of nouns, and trained a proxy model to analyze the feature effects on the AI-generated image. With model explanations of the proxy model, we curated a rubric to adjust text prompts to optimize image generation for precise emotion expression. We conducted simulation and user studies, which showed that RePrompt significantly improves the emotional expressiveness of AI-generated images, especially for negative emotions.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3544548.3581402

2302.09466

Country:

Europe > Germany (0.31)
Asia > Singapore (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

SalienTrack: providing salient information for semi-automated self-tracking feedback with model explanations

Wang, Yunlong, Liu, Jiaying, Park, Homin, Schultz-McArdle, Jordan, Rosenthal, Stephanie, Lim, Brian Y

arXiv.org Artificial IntelligenceSep-21-2021

Self-tracking can improve people's awareness of their unhealthy behaviors to provide insights towards behavior change. Prior work has explored how self-trackers reflect on their logged data, but it remains unclear how much they learn from the tracking feedback, and which information is more useful. Indeed, the feedback can still be overwhelming, and making it concise can improve learning by increasing focus and reducing interpretation burden. We conducted a field study of mobile food logging with two feedback modes (manual journaling and automatic annotation of food images) and identified learning differences regarding nutrition, assessment, behavioral, and contextual information. We propose a Self-Tracking Feedback Saliency Framework to define when to provide feedback, on which specific information, why those details, and how to present them (as manual inquiry or automatic feedback). We propose SalienTrack to implement these requirements. Using the data collected from the user study, we trained a machine learning model to predict whether a user would learn from each tracked event. Using explainable AI (XAI) techniques, we identified the most salient features per instance and why they lead to positive learning outcomes. We discuss implications for learnability in self-tracking, and how adding model explainability expands opportunities for improving feedback experience.

health & medicine, information, neural network, (21 more...)

arXiv.org Artificial Intelligence

2109.10231

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback