AITopics | Lin, Hongbin

Plotting

Lin, Hongbin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models

Guo, Zilu, Lin, Hongbin, Yuan, Zhihao, Zheng, Chaoda, Qiu, Pengshuo, Jiang, Dongzhi, Zhang, Renrui, Feng, Chun-Mei, Li, Zhen

arXiv.org Artificial IntelligenceMar-13-2025

3D Multimodal Large Language Models (MLLMs) have recently made substantial advancements. However, their potential remains untapped, primarily due to the limited quantity and suboptimal quality of 3D datasets. Current approaches attempt to transfer knowledge from 2D MLLMs to expand 3D instruction data, but still face modality and domain gaps. To this end, we introduce PiSA-Engine (Point-Self-Augmented-Engine), a new framework for generating instruction point-language datasets enriched with 3D spatial semantics. We observe that existing 3D MLLMs offer a comprehensive understanding of point clouds for annotation, while 2D MLLMs excel at cross-validation by providing complementary information. By integrating holistic 2D and 3D insights from off-the-shelf MLLMs, PiSA-Engine enables a continuous cycle of high-quality data generation. We select PointLLM as the baseline and adopt this co-evolution training framework to develop an enhanced 3D MLLM, termed PointLLM-PiSA. Additionally, we identify limitations in previous 3D benchmarks, which often feature coarse language captions and insufficient category diversity, resulting in inaccurate evaluations. To address this gap, we further introduce PiSA-Bench, a comprehensive 3D benchmark covering six key aspects with detailed and diverse labels. Experimental results demonstrate PointLLM-PiSA's state-of-the-art performance in zero-shot 3D object captioning and generative classification on our PiSA-Bench, achieving significant improvements of 46.45% (+8.33%) and 63.75% (+16.25%), respectively. We will release the code, datasets, and benchmark.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.10529

Country: Asia (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Passenger (0.46)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Towards Multi-dimensional Explanation Alignment for Medical Classification

Hu, Lijie, Lai, Songning, Chen, Wenshuo, Xiao, Hongru, Lin, Hongbin, Yu, Lu, Zhang, Jingfeng, Wang, Di

arXiv.org Artificial IntelligenceOct-28-2024

The lack of interpretability in the field of medical image analysis has significant ethical and legal implications. Existing interpretable methods in this domain encounter several challenges, including dependency on specific models, difficulties in understanding and visualization, as well as issues related to efficiency. To address these limitations, we propose a novel framework called Med-MICN (Medical Multi-dimensional Interpretable Concept Network). Med-MICN provides interpretability alignment for various angles, including neural symbolic reasoning, concept semantics, and saliency maps, which are superior to current interpretable methods. Its advantages include high prediction accuracy, interpretability across multiple dimensions, and automation through an end-to-end concept labeling process that reduces the need for extensive human training effort when working with new datasets. To demonstrate the effectiveness and interpretability of Med-MICN, we apply it to four benchmark datasets and compare it with baselines. The results clearly demonstrate the superior performance and interpretability of our Med-MICN.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.21494

Country: Asia > China (0.14)

Genre:

Research Report (1.00)
Instructional Material > Online (0.41)
Instructional Material > Course Syllabus & Notes (0.41)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

End-to-End Learning of Deep Visuomotor Policy for Needle Picking

Lin, Hongbin, Li, Bin, Chu, Xiangyu, Dou, Qi, Liu, Yunhui, Au, Kwok Wai Samuel

arXiv.org Artificial IntelligenceJul-26-2023

Needle picking is a challenging manipulation task in robot-assisted surgery due to the characteristics of small slender shapes of needles, needles' variations in shapes and sizes, and demands for millimeter-level control. Prior works, heavily relying on the prior of needles (e.g., geometric models), are hard to scale to unseen needles' variations. In this paper, we present the first end-to-end learning method to train deep visuomotor policy for needle picking. Concretely, we propose DreamerfD to maximally leverage demonstrations to improve the learning efficiency of a state-of-the-art model-based reinforcement learning method, DreamerV2; Since Variational Auto-Encoder (VAE) in DreamerV2 is difficult to scale to high-resolution images, we propose Dynamic Spotlight Adaptation to represent control-related visual signals in a low-resolution image space; Virtual Clutch is also proposed to reduce performance degradation due to significant error between prior and posterior encoded states at the beginning of a rollout. We conducted extensive experiments in simulation to evaluate the performance, robustness, in-domain variation adaptation, and effectiveness of individual components of our method. Our method, trained by 8k demonstration timesteps and 140k online policy timesteps, can achieve a remarkable success rate of 80%. Furthermore, our method effectively demonstrated its superiority in generalization to unseen in-domain variations including needle variations and image disturbance, highlighting its robustness and versatility. Codes and videos are available at https://sites.google.com/view/DreamerfD.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2303.03675

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

SSIM-Variation-Based Complexity Optimization for Versatile Video Coding

Lin, Jielian, Lin, Hongbin, Zhang, Zhichen, Xu, Yiwen, Zhao, Tiesong

arXiv.org Artificial IntelligenceMay-8-2022

To date, Versatile Video Coding (VVC) has a more magnificent overall performance than High Efficiency Video Coding (HEVC). The Quadtree with Nested Multi-Type Tree (QTMT) coding block structure can substantially enhance video coding quality in VVC. However, the coding gain also leads to a greater coding complexity. Therefore, this letter proposes a Fast Decision Scheme Based on Structural Similarity Index Metric Variation (FDS-SSIMV) to solve this problem. Firstly, the Structural Similarity Index Metric Variation (SSIMV) characteristic among the sub coding units of the spit mode is illustrated. Next, to evaluate the SSIMV value, SSIMV measure strategies are designed for different split modes in this letter. Then, the desired split modes are selected by the SSIMV values. Experimental results show that the proposed method achieves 64.74\% average encoding Time Saving (TS) with a 2.79\% Bj$\varnothing$ntegaard Delta Bit Rate (BDBR), outperforming the benchmarks.

artificial intelligence, machine learning, split mode, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LSP.2022.3227748

2205.03782

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Fast, Robust, and Versatile Event Detection through HMM Belief State Gradient Measures

Luo, Shuangqi, Wu, Hongmin, Lin, Hongbin, Duan, Shuangda, Guan, Yisheng, Rojas, Juan

arXiv.org Artificial IntelligenceJun-19-2018

Event detection is a critical feature in data-driven systems as it assists with the identification of nominal and anomalous behavior. Event detection is increasingly relevant in robotics as robots operate with greater autonomy in increasingly unstructured environments. In this work, we present an accurate, robust, fast, and versatile measure for skill and anomaly identification. A theoretical proof establishes the link between the derivative of the log-likelihood of the HMM filtered belief state and the latest emission probabilities. The key insight is the inverse relationship in which gradient analysis is used for skill and anomaly identification. Our measure showed better performance across all metrics than related state-of-the art works. The result is broadly applicable to domains that use HMMs for event detection.

event detection, health & medicine, neural network, (20 more...)

arXiv.org Artificial Intelligence

1709.07876

Country:

Europe > Italy (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback