AITopics | Yu, Junzhi

Collaborating Authors

Yu, Junzhi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Li, Jianxiong, Wang, Zhihao, Zheng, Jinliang, Zhou, Xiaoai, Wang, Guanming, Song, Guanglu, Liu, Yu, Liu, Jingjing, Zhang, Ya-Qin, Yu, Junzhi, Zhan, Xianyuan

arXiv.org Artificial IntelligenceOct-2-2024

Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions. Directly annotating multimodal instructions for model training proves impractical, due to the sparsity of paired multimodal data. In this study, we demonstrate that by leveraging unimodal instructions abundant in real data, we can effectively teach robots to learn multimodal task specifications. First, we endow the robot with strong \textit{Cross-modality Alignment} capabilities, by pretraining a robotic multimodal encoder using extensive out-of-domain data. Then, we employ two Collapse and Corrupt operations to further bridge the remaining modality gap in the learned multimodal representation. This approach projects different modalities of identical task goal as interchangeable representations, thus enabling accurate robotic operations within a well-aligned multimodal latent space. Evaluation across more than 130 tasks and 4000 evaluations on both simulated LIBERO benchmark and real robot platforms showcases the superior capabilities of our proposed framework, demonstrating significant advantage in overcoming data constraints in robotic learning. Website: zh1hao.wang/Robo_MUTUAL

artificial intelligence, modality, robo-mutual, (14 more...)

arXiv.org Artificial Intelligence

2410.01529

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

EF-Calib: Spatiotemporal Calibration of Event- and Frame-Based Cameras Using Continuous-Time Trajectories

Wang, Shaoan, Xin, Zhanhua, Hu, Yaoqing, Li, Dongyue, Zhu, Mingzhu, Yu, Junzhi

arXiv.org Artificial IntelligenceMay-27-2024

Event camera, a bio-inspired asynchronous triggered camera, offers promising prospects for fusion with frame-based cameras owing to its low latency and high dynamic range. However, calibrating stereo vision systems that incorporate both event and frame-based cameras remains a significant challenge. In this letter, we present EF-Calib, a spatiotemporal calibration framework for event- and frame-based cameras using continuous-time trajectories. A novel calibration pattern applicable to both camera types and the corresponding event recognition algorithm is proposed. Leveraging the asynchronous nature of events, a derivable piece-wise B-spline to represent camera pose continuously is introduced, enabling calibration for intrinsic parameters, extrinsic parameters, and time offset, with analytical Jacobians provided. Various experiments are carried out to evaluate the calibration performance of EF-Calib, including calibration experiments for intrinsic parameters, extrinsic parameters, and time offset. Experimental results show that EF-Calib achieves the most accurate intrinsic parameters compared to current SOTA, the close accuracy of the extrinsic parameters compared to the frame-based results, and accurate time offset estimation. EF-Calib provides a convenient and accurate toolbox for calibrating the system that fuses events and frames. The code of this paper will also be open-sourced at: https://github.com/wsakobe/EF-Calib.

artificial intelligence, event camera, frame-based camera, (17 more...)

arXiv.org Artificial Intelligence

2405.17278

Country:

Europe (0.93)
North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Frame-Oriented Architecture (1.00)

Add feedback

CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants

Wang, Shaoan, Zhu, Mingzhu, Hu, Yaoqing, Li, Dongyue, Yuan, Fusong, Yu, Junzhi

arXiv.org Artificial IntelligenceOct-20-2023

High-precision pose estimation based on visual markers has been a thriving research topic in the field of computer vision. However, the suitability of traditional flat markers on curved objects is limited due to the diverse shapes of curved surfaces, which hinders the development of high-precision pose estimation for curved objects. Therefore, this paper proposes a novel visual marker called CylinderTag, which is designed for developable curved surfaces such as cylindrical surfaces. CylinderTag is a cyclic marker that can be firmly attached to objects with a cylindrical shape. Leveraging the manifold assumption, the cross-ratio in projective invariance is utilized for encoding in the direction of zero curvature on the surface. Additionally, to facilitate the usage of CylinderTag, we propose a heuristic search-based marker generator and a high-performance recognizer as well. Moreover, an all-encompassing evaluation of CylinderTag properties is conducted by means of extensive experimentation, covering detection rate, detection speed, dictionary size, localization jitter, and pose estimation accuracy. CylinderTag showcases superior detection performance from varying view angles in comparison to traditional visual markers, accompanied by higher localization accuracy. Furthermore, CylinderTag boasts real-time detection capability and an extensive marker dictionary, offering enhanced versatility and practicality in a wide range of applications. Experimental results demonstrate that the CylinderTag is a highly promising visual marker for use on cylindrical-like surfaces, thus offering important guidance for future research on high-precision visual localization of cylinder-shaped objects. The code is available at: https://github.com/wsakobe/CylinderTag.

artificial intelligence, cylindertag, video understanding, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TVCG.2024.3350901

2310.1332

Country:

Asia (0.96)
North America > Canada > Quebec (0.14)
North America > Canada > British Columbia (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)

Add feedback

A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

Si, Lingyu, Dong, Hongwei, Qiang, Wenwen, Yu, Junzhi, Zhai, Wenlong, Zheng, Changwen, Xu, Fanjiang, Sun, Fuchun

arXiv.org Artificial IntelligenceJun-28-2023

Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.15977

Country: Asia > Singapore (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Simultaneously Calibration of Multi Hand-Eye Robot System Based on Graph

Zhou, Zishun, Ma, Liping, Liu, Xilong, Cao, Zhiqiang, Yu, Junzhi

arXiv.org Artificial IntelligenceMay-3-2023

Precise calibration is the basis for the vision-guided robot system to achieve high-precision operations. Systems with multiple eyes (cameras) and multiple hands (robots) are particularly sensitive to calibration errors, such as micro-assembly systems. Most existing methods focus on the calibration of a single unit of the whole system, such as poses between hand and eye, or between two hands. These methods can be used to determine the relative pose between each unit, but the serialized incremental calibration strategy cannot avoid the problem of error accumulation in a large-scale system. Instead of focusing on a single unit, this paper models the multi-eye and multi-hand system calibration problem as a graph and proposes a method based on the minimum spanning tree and graph optimization. This method can automatically plan the serialized optimal calibration strategy in accordance with the system settings to get coarse calibration results initially. Then, with these initial values, the closed-loop constraints are introduced to carry out global optimization. Simulation experiments demonstrate the performance of the proposed algorithm under different noises and various hand-eye configurations. In addition, experiments on real robot systems are presented to further verify the proposed method.

artificial intelligence, calibration, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2305.02518

Country: Asia > China (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback