AITopics | Yin, Hao

Collaborating Authors

Yin, Hao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference

Yin, Hao, Si, Guangzong, Wang, Zilei

arXiv.org Artificial IntelligenceMar-17-2025

Multimodal large language models (MLLMs) improve performance on vision-language tasks by integrating visual features from pre-trained vision encoders into large language models (LLMs). However, how MLLMs process and utilize visual information remains unclear. In this paper, a shift in the dominant flow of visual information is uncovered: (1) in shallow layers, strong interactions are observed between image tokens and instruction tokens, where most visual information is injected into instruction tokens to form cross-modal semantic representations; (2) in deeper layers, image tokens primarily interact with each other, aggregating the remaining visual information to optimize semantic representations within visual modality. Based on these insights, we propose Hierarchical Modality-Aware Pruning (HiMAP), a plug-and-play inference acceleration method that dynamically prunes image tokens at specific layers, reducing computational costs by approximately 65% without sacrificing performance. Our findings offer a new understanding of visual information processing in MLLMs and provide a state-of-the-art solution for efficient inference.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.13108

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models

Yin, Hao, Si, Guangzong, Wang, Zilei

arXiv.org Artificial IntelligenceMar-17-2025

Contrastive decoding strategies are widely used to mitigate object hallucinations in multimodal large language models (MLLMs). By reducing over-reliance on language priors, these strategies ensure that generated content remains closely grounded in visual inputs, producing contextually accurate outputs. Since contrastive decoding requires no additional training or external tools, it offers both computational efficiency and versatility, making it highly attractive. However, these methods present two main limitations: (1) bluntly suppressing language priors can compromise coherence and accuracy of generated content, and (2) processing contrastive inputs adds computational load, significantly slowing inference speed. To address these challenges, we propose Visual Amplification Fusion (VAF), a plug-and-play technique that enhances attention to visual signals within the model's middle layers, where modality fusion predominantly occurs. This approach enables more effective capture of visual features, reducing the model's bias toward language modality. Experimental results demonstrate that VAF significantly reduces hallucinations across various MLLMs without affecting inference speed, while maintaining coherence and accuracy in generated outputs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.13107

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions

Yin, Hao, Parmar, Paritosh, Xu, Daoliang, Zhang, Yang, Zheng, Tianyou, Fu, Weiwei

arXiv.org Artificial IntelligenceFeb-4-2025

Action Quality Assessment (AQA) -- the ability to quantify the quality of human motion, actions, or skill levels and provide feedback -- has far-reaching implications in areas such as low-cost physiotherapy, sports training, and workforce development. As such, it has become a critical field in computer vision & video understanding over the past decade. Significant progress has been made in AQA methodologies, datasets, & applications, yet a pressing need remains for a comprehensive synthesis of this rapidly evolving field. In this paper, we present a thorough survey of the AQA landscape, systematically reviewing over 200 research papers using the preferred reporting items for systematic reviews & meta-analyses (PRISMA) framework. We begin by covering foundational concepts & definitions, then move to general frameworks & performance metrics, & finally discuss the latest advances in methodologies & datasets. This survey provides a detailed analysis of research trends, performance comparisons, challenges, & future directions. Through this work, we aim to offer a valuable resource for both newcomers & experienced researchers, promoting further exploration & progress in AQA. Data are available at https://haoyin116.github.io/Survey_of_AQA/

assessment, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.02817

Country: Asia > China > Anhui Province (0.14)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area (0.92)
Health & Medicine > Health Care Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Perturbation Ontology based Graph Attention Networks

Wang, Yichen, Wang, Jie, Wang, Fulin, Li, Xiang, Yin, Hao, Raj, Bhiksha

arXiv.org Artificial IntelligenceNov-27-2024

In recent years, graph representation learning has undergone a paradigm shift, driven by the emergence and proliferation of graph neural networks (GNNs) and their heterogeneous counterparts. Heterogeneous GNNs have shown remarkable success in extracting low-dimensional embeddings from complex graphs that encompass diverse entity types and relationships. While meta-path-based techniques have long been recognized for their ability to capture semantic affinities among nodes, their dependence on manual specification poses a significant limitation. In contrast, matrix-focused methods accelerate processing by utilizing structural cues but often overlook contextual richness. In this paper, we challenge the current paradigm by introducing ontology as a fundamental semantic primitive within complex graphs. Our goal is to integrate the strengths of both matrix-centric and meta-path-based approaches into a unified framework. We propose perturbation Ontology-based Graph Attention Networks (POGAT), a novel methodology that combines ontology subgraphs with an advanced self-supervised learning paradigm to achieve a deep contextual understanding. The core innovation of POGAT lies in our enhanced homogeneous perturbing scheme designed to generate rigorous negative samples, encouraging the model to explore minimal contextual features more thoroughly. Through extensive empirical evaluations, we demonstrate that POGAT significantly outperforms state-of-the-art baselines, achieving a groundbreaking improvement of up to 10.78\% in F1-score for the critical task of link prediction and 12.01\% in Micro-F1 for the critical task of node classification.

artificial intelligence, graph attention network, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2411.1852

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Navigating Spatial Inequities in Freight Truck Crash Severity via Counterfactual Inference in Los Angeles

Wang, Yichen, Yin, Hao, Yang, Yifan, Zhao, Chenyang, Wang, Siqin

arXiv.org Artificial IntelligenceNov-26-2024

Freight truck-related crashes pose significant challenges, leading to substantial economic losses, injuries, and fatalities, with pronounced spatial disparities across different regions. This study adopts a transport geography perspective to examine spatial justice concerns by employing deep counterfactual inference models to analyze how socioeconomic disparities, road infrastructure, and environmental conditions influence the geographical distribution and severity of freight truck crashes. By integrating road network datasets, socioeconomic attributes, and crash records from the Los Angeles metropolitan area, this research provides a nuanced spatial analysis of how different communities are disproportionately impacted. The results reveal significant spatial disparities in crash severity across areas with varying population densities, income levels, and minority populations, highlighting the pivotal role of infrastructural and environmental improvements in mitigating these disparities. The findings offer insights into targeted, location-specific policy interventions, suggesting enhancements in road infrastructure, lighting, and traffic control systems, particularly in low-income and minority-concentrated areas. This research contributes to the literature on transport geography and spatial equity by providing data-driven insights into effective measures for reducing spatial injustices associated with freight truck-related crashes.

artificial intelligence, machine learning, severity, (11 more...)

arXiv.org Artificial Intelligence

2411.17554

Country: North America > United States > California (0.68)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Learning for Semantic Knowledge Base-Guided Online Feature Transmission in Dynamic Channels

Gao, Xiangyu, Sun, Yaping, Wei, Dongyu, Xu, Xiaodong, Chen, Hao, Yin, Hao, Cui, Shuguang

arXiv.org Artificial IntelligenceNov-30-2023

With the proliferation of edge computing, efficient AI inference on edge devices has become essential for intelligent applications such as autonomous vehicles and VR/AR. In this context, we address the problem of efficient remote object recognition by optimizing feature transmission between mobile devices and edge servers. We propose an online optimization framework to address the challenge of dynamic channel conditions and device mobility in an end-to-end communication system. Our approach builds upon existing methods by leveraging a semantic knowledge base to drive multi-level feature transmission, accounting for temporal factors and dynamic elements throughout the transmission process. To solve the online optimization problem, we design a novel soft actor-critic-based deep reinforcement learning system with a carefully designed reward function for real-time decision-making, overcoming the optimization difficulty of the NP-hard problem and achieving the minimization of semantic loss while respecting latency constraints. Numerical results showcase the superiority of our approach compared to traditional greedy methods under various system setups.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.18316

Country: Asia > China (0.48)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Ground-Challenge: A Multi-sensor SLAM Dataset Focusing on Corner Cases for Ground Robots

Yin, Jie, Yin, Hao, Liang, Conghui, Zhang, Zhengyou

arXiv.org Artificial IntelligenceJul-7-2023

High-quality datasets can speed up breakthroughs and reveal potential developing directions in SLAM research. To support the research on corner cases of visual SLAM systems, this paper presents Ground-Challenge: a challenging dataset comprising 36 trajectories with diverse corner cases such as aggressive motion, severe occlusion, changing illumination, few textures, pure rotation, motion blur, wheel suspension, etc. The dataset was collected by a ground robot with multiple sensors including an RGB-D camera, an inertial measurement unit (IMU), a wheel odometer and a 3D LiDAR. All of these sensors were well-calibrated and synchronized, and their data were recorded simultaneously. To evaluate the performance of cutting-edge SLAM systems, we tested them on our dataset and demonstrated that these systems are prone to drift and fail on specific sequences. We will release the full dataset and relevant materials upon paper publication to benefit the research community. For more information, visit our project website at https://github.com/sjtuyinjie/Ground-Challenge.

artificial intelligence, dataset, press release, (19 more...)

arXiv.org Artificial Intelligence

2307.0389

Country:

North America > United States (0.14)
Europe (0.14)
Asia > China (0.14)

Genre:

Press Release (0.54)
Research Report (0.50)

Industry: Transportation (0.46)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Add feedback

Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers

Yuan, Lei, Zhang, Zi-Qian, Xue, Ke, Yin, Hao, Chen, Feng, Guan, Cong, Li, Li-He, Qian, Chao, Yu, Yang

arXiv.org Artificial IntelligenceMay-10-2023

Cooperative multi-agent reinforcement learning (CMARL) has shown to be promising for many real-world applications. Previous works mainly focus on improving coordination ability via solving MARL-specific challenges (e.g., non-stationarity, credit assignment, scalability), but ignore the policy perturbation issue when testing in a different environment. This issue hasn't been considered in problem formulation or efficient algorithm design. To address this issue, we firstly model the problem as a limited policy adversary Dec-POMDP (LPA-Dec-POMDP), where some coordinators from a team might accidentally and unpredictably encounter a limited number of malicious action attacks, but the regular coordinators still strive for the intended goal. Then, we propose Robust Multi-Agent Coordination via Evolutionary Generation of Auxiliary Adversarial Attackers (ROMANCE), which enables the trained policy to encounter diversified and strong auxiliary adversarial attacks during training, thus achieving high robustness under various policy perturbations. Concretely, to avoid the ego-system overfitting to a specific attacker, we maintain a set of attackers, which is optimized to guarantee the attackers high attacking quality and behavior diversity. The goal of quality is to minimize the ego-system coordination effect, and a novel diversity regularizer based on sparse action is applied to diversify the behaviors among attackers. The ego-system is then paired with a population of attackers selected from the maintained attacker set, and alternately trained against the constantly evolving attackers. Extensive experiments on multiple scenarios from SMAC indicate our ROMANCE provides comparable or better robustness and generalization ability than other baselines.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2305.05909

Genre: Research Report (1.00)

Industry:

Government > Military (0.49)
Information Technology > Security & Privacy (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Sky-GVINS: a Sky-segmentation Aided GNSS-Visual-Inertial System for Robust Navigation in Urban Canyons

Yin, Jie, Li, Tao, Yin, Hao, Yu, Wenxian, Zou, Danping

arXiv.org Artificial IntelligenceApr-8-2023

Integrating Global Navigation Satellite Systems (GNSS) in Simultaneous Localization and Mapping (SLAM) systems draws increasing attention to a global and continuous localization solution. Nonetheless, in dense urban environments, GNSS-based SLAM systems will suffer from the Non-Line-Of-Sight (NLOS) measurements, which might lead to a sharp deterioration in localization results. In this paper, we propose to detect the sky area from the up-looking camera to improve GNSS measurement reliability for more accurate position estimation. We present Sky-GVINS: a sky-aware GNSS-Visual-Inertial system based on a recent work called GVINS. Specifically, we adopt a global threshold method to segment the sky regions and non-sky regions in the fish-eye sky-pointing image and then project satellites to the image using the geometric relationship between satellites and the camera. After that, we reject satellites in non-sky regions to eliminate NLOS signals. We investigated various segmentation algorithms for sky detection and found that the Otsu algorithm reported the highest classification rate and computational efficiency, despite the algorithm's simplicity and ease of implementation. To evaluate the effectiveness of Sky-GVINS, we built a ground robot and conducted extensive real-world experiments on campus. Experimental results show that our method improves localization accuracy in both open areas and dense urban environments compared to the baseline method. Finally, we also conduct a detailed analysis and point out possible further directions for future research. For detailed information, visit our project website at https://github.com/SJTU-ViSYS/Sky-GVINS.

artificial intelligence, machine learning, sky-gvin, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/10095020.2023.2191649

2304.04007

Country:

Asia > China (0.49)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Geographic Information Systems (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback

A multi view multi stage and multi window framework for pulmonary artery segmentation from CT scans

Liu, ZeYu, Wang, Yi, Wen, Jing, Zhang, Yong, Yin, Hao, Guo, Chao, Wang, ZhongYu

arXiv.org Artificial IntelligenceSep-14-2022

This is the technical report of the 9th place in the final result of PARSE2022 Challenge. We solve the segmentation problem of the pulmonary artery by using a two-stage method based on a 3D CNN network. The coarse model is used to locate the ROI, and the fine model is used to refine the segmentation result. In addition, in order to improve the segmentation performance, we adopt multi-view and multi-window level method, at the same time we employ a fine-tune strategy to mitigate the impact of inconsistent labeling.

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2209.03918

Country: Asia > China (0.36)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback