AITopics | Wang, Yuhan

Collaborating Authors

Wang, Yuhan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dexterous Non-Prehensile Manipulation for Ungraspable Object via Extrinsic Dexterity

Wang, Yuhan, Li, Yu, Yang, Yaodong, Chen, Yuanpei

arXiv.org Artificial IntelligenceMar-29-2025

Objects with large base areas become ungraspable when they exceed the end-effector's maximum aperture. Existing approaches address this limitation through extrinsic dexterity, which exploits environmental features for non-prehensile manipulation. While grippers have shown some success in this domain, dexterous hands offer superior flexibility and manipulation capabilities that enable richer environmental interactions, though they present greater control challenges. Here we present ExDex, a dexterous arm-hand system that leverages reinforcement learning to enable non-prehensile manipulation for grasping ungraspable objects. Our system learns two strategic manipulation sequences: relocating objects from table centers to edges for direct grasping, or to walls where extrinsic dexterity enables grasping through environmental interaction. We validate our approach through extensive experiments with dozens of diverse household objects, demonstrating both superior performance and generalization capabilities with novel objects. Furthermore, we successfully transfer the learned policies from simulation to a real-world robot system without additional training, further demonstrating its applicability in real-world scenarios. Project website: https://tangty11.github.io/ExDex/.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2503.2312

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.95)

Add feedback

Causal invariant geographic network representations with feature and structural distribution shifts

Wang, Yuhan, He, Silu, Luo, Qinyao, Yuan, Hongyuan, Zhao, Ling, Zhu, Jiawei, Li, Haifeng

arXiv.org Machine LearningMar-25-2025

The existing methods learn geographic network representations through deep graph neural networks (GNNs) based on the i.i.d. assumption. However, the spatial heterogeneity and temporal dynamics of geographic data make the out-of-distribution (OOD) generalisation problem particularly salient. The latter are particularly sensitive to distribution shifts (feature and structural shifts) between testing and training data and are the main causes of the OOD generalisation problem. Spurious correlations are present between invariant and background representations due to selection biases and environmental effects, resulting in the model extremes being more likely to learn background representations. The existing approaches focus on background representation changes that are determined by shifts in the feature distributions of nodes in the training and test data while ignoring changes in the proportional distributions of heterogeneous and homogeneous neighbour nodes, which we refer to as structural distribution shifts. We propose a feature-structure mixed invariant representation learning (FSM-IRL) model that accounts for both feature distribution shifts and structural distribution shifts. To address structural distribution shifts, we introduce a sampling method based on causal attention, encouraging the model to identify nodes possessing strong causal relationships with labels or nodes that are more similar to the target node. Inspired by the Hilbert-Schmidt independence criterion, we implement a reweighting strategy to maximise the orthogonality of the node representations, thereby mitigating the spurious correlations among the node representations and suppressing the learning of background representations. Our experiments demonstrate that FSM-IRL exhibits strong learning capabilities on both geographic and social network datasets in OOD scenarios.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.future.2025.107814

2503.19382

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.49)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

Wang, Yuhan, Hong, Fangzhou, Yang, Shuai, Jiang, Liming, Wu, Wayne, Loy, Chen Change

arXiv.org Artificial IntelligenceMar-11-2025

Multiview diffusion models have shown considerable success in image-to-3D generation for general objects. However, when applied to human data, existing methods have yet to deliver promising results, largely due to the challenges of scaling multiview attention to higher resolutions. In this paper, we explore human multiview diffusion models at the megapixel level and introduce a solution called mesh attention to enable training at 1024x1024 resolution. Using a clothed human mesh as a central coarse geometric representation, the proposed mesh attention leverages rasterization and projection to establish direct cross-view coordinate correspondences. This approach significantly reduces the complexity of multiview attention while maintaining cross-view consistency. Building on this foundation, we devise a mesh attention block and combine it with keypoint conditioning to create our human-specific multiview diffusion model, MEAT. In addition, we present valuable insights into applying multiview human motion videos for diffusion training, addressing the longstanding issue of data scarcity. Extensive experiments show that MEAT effectively generates dense, consistent multiview human images at the megapixel level, outperforming existing multiview diffusion methods.

artificial intelligence, machine learning, resolution, (16 more...)

arXiv.org Artificial Intelligence

2503.08664

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

Shi, Zhengliang, Wang, Yuhan, Yan, Lingyong, Ren, Pengjie, Wang, Shuaiqiang, Yin, Dawei, Ren, Zhaochun

arXiv.org Artificial IntelligenceMar-3-2025

Tool learning aims to augment large language models (LLMs) with diverse tools, enabling them to act as agents for solving practical tasks. Due to the limited context length of tool-using LLMs, adopting information retrieval (IR) models to select useful tools from large toolsets is a critical initial step. However, the performance of IR models in tool retrieval tasks remains underexplored and unclear. Most tool-use benchmarks simplify this step by manually pre-annotating a small set of relevant tools for each task, which is far from the real-world scenarios. In this paper, we propose ToolRet, a heterogeneous tool retrieval benchmark comprising 7.6k diverse retrieval tasks, and a corpus of 43k tools, collected from existing datasets. We benchmark six types of models on ToolRet. Surprisingly, even the models with strong performance in conventional IR benchmarks, exhibit poor performance on ToolRet. This low retrieval quality degrades the task pass rate of tool-use LLMs. As a further step, we contribute a large-scale training dataset with over 200k instances, which substantially optimizes the tool retrieval ability of IR models.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.01763

Country:

Europe (0.67)
Asia > China (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter

Liu, Xiaohong, Zhao, Xulong, Liu, Gang, Wu, Zili, Wang, Tao, Meng, Lei, Wang, Yuhan

arXiv.org Artificial IntelligenceFeb-12-2025

3D Multi-Object Tracking (MOT) provides the trajectories of surrounding objects, assisting robots or vehicles in smarter path planning and obstacle avoidance. Existing 3D MOT methods based on the Tracking-by-Detection framework typically use a single motion model to track an object throughout its entire tracking process. However, objects may change their motion patterns due to variations in the surrounding environment. In this paper, we introduce the Interacting Multiple Model filter in IMM-MOT, which accurately fits the complex motion patterns of individual objects, overcoming the limitation of single-model tracking in existing approaches. In addition, we incorporate a Damping Window mechanism into the trajectory lifecycle management, leveraging the continuous association status of trajectories to control their creation and termination, reducing the occurrence of overlooked low-confidence true targets. Furthermore, we propose the Distance-Based Score Enhancement module, which enhances the differentiation between false positives and true positives by adjusting detection scores, thereby improving the effectiveness of the Score Filter. On the NuScenes Val dataset, IMM-MOT outperforms most other single-modal models using 3D point clouds, achieving an AMOTA of 73.8%. Our project is available at https://github.com/Ap01lo/IMM-MOT.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2502.09672

Country: Asia > China > Shaanxi Province (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.54)

Add feedback

Neuro-Conceptual Artificial Intelligence: Integrating OPM with Deep Learning to Enhance Question Answering Quality

Kang, Xin, Shteingardt, Veronika, Wang, Yuhan, Dori, Dov

arXiv.org Artificial IntelligenceFeb-12-2025

Knowledge representation and reasoning are critical challenges in Artificial Intelligence (AI), particularly in integrating neural and symbolic approaches to achieve explainable and transparent AI systems. Traditional knowledge representation methods often fall short of capturing complex processes and state changes. We introduce Neuro-Conceptual Artificial Intelligence (NCAI), a specialization of the neuro-symbolic AI approach that integrates conceptual modeling using Object-Process Methodology (OPM) ISO 19450:2024 with deep learning to enhance question-answering (QA) quality. By converting natural language text into OPM models using in-context learning, NCAI leverages the expressive power of OPM to represent complex OPM elements-processes, objects, and states-beyond what traditional triplet-based knowledge graphs can easily capture. This rich structured knowledge representation improves reasoning transparency and answer accuracy in an OPM-QA system. We further propose transparency evaluation metrics to quantitatively measure how faithfully the predicted reasoning aligns with OPM-based conceptual logic. Our experiments demonstrate that NCAI outperforms traditional methods, highlighting its potential for advancing neuro-symbolic AI by providing rich knowledge representations, measurable transparency, and improved reasoning.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.09658

Country:

North America > United States (0.46)
Asia > Middle East > UAE (0.14)
Asia > Middle East > Israel (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

A Deep Learning Framework Integrating CNN and BiLSTM for Financial Systemic Risk Analysis and Prediction

Cheng, Yu, Xu, Zhen, Chen, Yuan, Wang, Yuhan, Lin, Zhenghao, Liu, Jinsong

arXiv.org Artificial IntelligenceFeb-7-2025

This study proposes a deep learning model based on the combination of convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) for discriminant analysis of financial systemic risk. The model first uses CNN to extract local patterns of multidimensional features of financial markets, and then models the bidirectional dependency of time series through BiLSTM, to comprehensively characterize the changing laws of systemic risk in spatial features and temporal dynamics. The experiment is based on real financial data sets. The results show that the model is significantly superior to traditional single models (such as BiLSTM, CNN, Transformer, and TCN) in terms of accuracy, recall, and F1 score. The F1-score reaches 0.88, showing extremely high discriminant ability. This shows that the joint strategy of combining CNN and BiLSTM can not only fully capture the complex patterns of market data but also effectively deal with the long-term dependency problem in time series data. In addition, this study also explores the robustness of the model in dealing with data noise and processing high-dimensional data, providing strong support for intelligent financial risk management. In the future, the research will further optimize the model structure, introduce methods such as reinforcement learning and multimodal data analysis, and improve the efficiency and generalization ability of the model to cope with a more complex financial environment.

artificial intelligence, machine learning, systemic risk, (15 more...)

arXiv.org Artificial Intelligence

2502.06847

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications

Wang, Yuhan, Xu, Zhen, Yao, Yue, Liu, Jinsong, Lin, Jiating

arXiv.org Artificial IntelligenceDec-24-2024

With the development of the financial industry, credit default prediction, as an important task in financial risk management, has received increasing attention. Traditional credit default prediction methods mostly rely on machine learning models, such as decision trees and random forests, but these methods have certain limitations in processing complex data and capturing potential risk patterns. To this end, this paper proposes a deep learning model based on the combination of convolutional neural networks (CNN) and Transformer for credit user default prediction. The model combines the advantages of CNN in local feature extraction with the ability of Transformer in global dependency modeling, effectively improving the accuracy and robustness of credit default prediction. Through experiments on public credit default datasets, the results show that the CNN+Transformer model outperforms traditional machine learning models, such as random forests and XGBoost, in multiple evaluation indicators such as accuracy, AUC, and KS value, demonstrating its powerful ability in complex financial data modeling. Further experimental analysis shows that appropriate optimizer selection and learning rate adjustment play a vital role in improving model performance. In addition, the ablation experiment of the model verifies the advantages of the combination of CNN and Transformer and proves the complementarity of the two in credit default prediction. This study provides a new idea for credit default prediction and provides strong support for risk assessment and intelligent decision-making in the financial field. Future research can further improve the prediction effect and generalization ability by introducing more unstructured data and improving the model architecture.

artificial intelligence, default prediction, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.18222

Country:

North America > United States (0.47)
Asia (0.28)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.34)

Industry:

Banking & Finance > Credit (1.00)
Information Technology > Security & Privacy (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding

Xu, Linrui, Zhao, Ling, Guo, Wang, Li, Qiujun, Long, Kewang, Zou, Kaiqi, Wang, Yuhan, Li, Haifeng

arXiv.org Artificial IntelligenceJun-18-2024

The remote sensing image intelligence understanding model is undergoing a new profound paradigm shift which has been promoted by multi-modal large language model (MLLM), i.e. from the paradigm learning a domain model (LaDM) shifts to paradigm learning a pre-trained general foundation model followed by an adaptive domain model (LaGD). Under the new LaGD paradigm, the old datasets, which have led to advances in RSI intelligence understanding in the last decade, are no longer suitable for fire-new tasks. We argued that a new dataset must be designed to lighten tasks with the following features: 1) Generalization: training model to learn shared knowledge among tasks and to adapt to different tasks; 2) Understanding complex scenes: training model to understand the fine-grained attribute of the objects of interest, and to be able to describe the scene with natural language; 3) Reasoning: training model to be able to realize high-level visual reasoning. In this paper, we designed a high-quality, diversified, and unified multimodal instruction-following dataset for RSI understanding produced by GPT-4V and existing datasets, which we called RS-GPT4V. To achieve generalization, we used a (Question, Answer) which was deduced from GPT-4V via instruction-following to unify the tasks such as captioning and localization; To achieve complex scene, we proposed a hierarchical instruction description with local strategy in which the fine-grained attributes of the objects and their spatial relationships are described and global strategy in which all the local information are integrated to yield detailed instruction descript; To achieve reasoning, we designed multiple-turn QA pair to provide the reasoning ability for a model. The empirical results show that the fine-tuned MLLMs by RS-GPT4V can describe fine-grained information. The dataset is available at: https://github.com/GeoX-Lab/RS-GPT4V.

artificial intelligence, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2406.12479

Country:

North America > United States > Michigan (0.14)
Europe > Spain (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.71)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)

Add feedback

Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning

Sun, Zhongao, Li, Jiameng, Wang, Yuhan, Cheng, Jiarong, Zhou, Qing, Li, Chun

arXiv.org Artificial IntelligenceJun-12-2024

Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI) where missing modality images are common in clinical settings, leading to reduced segmentation accuracy. To address this issue, we propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data. Additionally, in the fine-tuning phase, we utilize a knowledge distillation technique to align features between complete and missing modality data, simultaneously enhancing model robustness. Notably, we leverage the Holder pseudo-divergence instead of the KLD for distillation loss, offering improve mathematical interpretability and properties. Extensive experiments on the BRATS2018 and BRATS2020 datasets demonstrate significant performance enhancements compared to existing state-of-the-art methods.

artificial intelligence, machine learning, modality, (14 more...)

arXiv.org Artificial Intelligence

2406.08634

Country:

North America > Canada > Quebec (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.87)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback