AITopics | Xiang, Yu

Collaborating Authors

Xiang, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

V-HOP: Visuo-Haptic 6D Object Pose Tracking

Li, Hongyu, Jia, Mingxi, Akbulut, Tuluhan, Xiang, Yu, Konidaris, George, Sridhar, Srinath

arXiv.org Artificial IntelligenceFeb-24-2025

Humans naturally integrate vision and haptics for robust object perception during manipulation. The loss of either modality significantly degrades performance. Inspired by this multisensory integration, prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. Although these works demonstrate improvements in controlled environments or synthetic datasets, they often underperform vision-only approaches in real-world settings due to poor generalization across diverse grippers, sensor layouts, or sim-to-real environments. Furthermore, they typically estimate the object pose for each frame independently, resulting in less coherent tracking over sequences in real-world deployments. To address these limitations, we introduce a novel unified haptic representation that effectively handles multiple gripper embodiments. Building on this representation, we introduce a new visuo-haptic transformer-based object pose tracker that seamlessly integrates visual and haptic input. We validate our framework in our dataset and the Feelsight dataset, demonstrating significant performance improvement on challenging sequences. Notably, our method achieves superior generalization and robustness across novel embodiments, objects, and sensor types (both taxel-based and vision-based tactile sensors). In real-world experiments, we demonstrate that our approach outperforms state-of-the-art visual trackers by a large margin. We further show that we can achieve precise manipulation tasks by incorporating our real-time object tracking result into motion plans, underscoring the advantages of visuo-haptic perception. Our model and dataset will be made open source upon acceptance of the paper. Project website: https://lhy.xyz/projects/v-hop/

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.17434

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

iTeach: Interactive Teaching for Robot Perception using Mixed Reality

P, Jishnu Jaykumar, Salvato, Cole, Bomnale, Vinaya, Wang, Jikai, Xiang, Yu

arXiv.org Artificial IntelligenceOct-1-2024

We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to door and handle detection for household tasks, iTeach integrates a HoloLens app with an interactive YOLO model. Furthermore, we introduce the IRVLUTD DoorHandle dataset. DH-YOLO, our efficient detection model, significantly enhances the accuracy and efficiency of door and handle detection, highlighting the potential of MR to make robotic systems more capable and adaptive in real-world environments. The project page is available at https://irvlutd.github.io/iTeach.

artificial intelligence, detection, robot, (17 more...)

arXiv.org Artificial Intelligence

2410.09072

Country: North America > United States (0.28)

Genre:

Workflow (0.70)
Research Report (0.50)

Industry: Education > Educational Setting > Online (0.87)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Isometric Immersion Learning with Riemannian Geometry

Chen, Zihao, Wang, Wenyong, Xiang, Yu

arXiv.org Artificial IntelligenceSep-23-2024

Manifold learning has been proven to be an effective method for capturing the implicitly intrinsic structure of non-Euclidean data, in which one of the primary challenges is how to maintain the distortion-free (isometry) of the data representations. Actually, there is still no manifold learning method that provides a theoretical guarantee of isometry. Inspired by Nash's isometric theorem, we introduce a new concept called isometric immersion learning based on Riemannian geometry principles. Following this concept, an unsupervised neural network-based model that simultaneously achieves metric and manifold learning is proposed by integrating Riemannian geometry priors. What's more, we theoretically derive and algorithmically implement a maximum likelihood estimation-based training method for the new model. In the simulation experiments, we compared the new model with the state-of-the-art baselines on various 3-D geometry datasets, demonstrating that the new model exhibited significantly superior performance in multiple evaluation metrics. Moreover, we applied the Riemannian metric learned from the new model to downstream prediction tasks in real-world scenarios, and the accuracy was improved by an average of 8.8%.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Artificial Intelligence

2409.1476

Genre: Research Report (0.64)

Industry: Education (0.77)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots

Allu, Sai Haneesh, Kadosh, Itay, Summers, Tyler, Xiang, Yu

arXiv.org Artificial IntelligenceSep-23-2024

We introduce a new robotic system that enables a mobile robot to autonomously explore an unknown environment, build a semantic map of the environment, and subsequently update the semantic map to reflect environment changes, such as location changes of objects. Our system leverages a LiDAR scanner for 2D occupancy grid mapping and an RGB-D camera for object perception. We introduce a semantic map representation that combines a 2D occupancy grid map for geometry, with a topological map for object semantics. This map representation enables us to effectively update the semantics by deleting or adding nodes to the topological map. Our system has been tested on a Fetch robot. The robot can semantically map a 93m x 90m floor and update the semantic map once objects are moved in the environment.

artificial intelligence, robot, semantic map, (15 more...)

arXiv.org Artificial Intelligence

2409.15493

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.62)

Add feedback

RobotFingerPrint: Unified Gripper Coordinate Space for Multi-Gripper Grasp Synthesis

Khargonkar, Ninad, Casas, Luis Felipe, Prabhakaran, Balakrishnan, Xiang, Yu

arXiv.org Artificial IntelligenceSep-22-2024

Abstract-- We introduce a novel representation named as the unified gripper coordinate space for grasp synthesis of multiple grippers. The space is a 2D surface of a sphere in 3D using longitude and latitude as its coordinates, and it is shared for all robotic grippers. We propose a new algorithm to map the palm surface of a gripper into the unified gripper coordinate space, and design a conditional variational autoencoder to predict the unified gripper coordinates given an input object. The predicted unified gripper coordinates establish correspondences between the gripper and the object, which can be used in an optimization problem to solve the grasp pose and the finger joints for grasp synthesis. We demonstrate that using the unified gripper coordinate space improves the success rate and diversity in the grasp synthesis of multiple grippers.

artificial intelligence, gripper, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2409.14519

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.88)

Add feedback

Mining Invariance from Nonlinear Multi-Environment Data: Binary Classification

Goddard, Austin, Du, Kang, Xiang, Yu

arXiv.org Artificial IntelligenceJul-3-2024

Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environments. We provide sufficient conditions for such invariance and show it is robust even when environmental conditions vary greatly. Our formulation admits a causal interpretation, allowing us to compare it with various frameworks. Finally, we propose a heuristic prediction method and conduct experiments using real and synthetic datasets.

artificial intelligence, bimp, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.15245

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

Lu, Yangxiao, P, Jishnu Jaykumar, Guo, Yunhui, Ruozzi, Nicholas, Xiang, Yu

arXiv.org Artificial IntelligenceMay-28-2024

Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize the Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilize foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting. This methodology enables a straightforward matching strategy, resulting in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements of 22.3, 46.2, 10.3, and 24.0 in average precision (AP) across four detection datasets. In instance segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the top RGB methods by 3.6 AP and remains competitive with the best RGB-D method. Code is available at: https://github.com/YoungSean/NIDS-Net

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.17859

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Training-Conditional Coverage Bounds under Covariate Shift

Pournaderi, Mehrdad, Xiang, Yu

arXiv.org Machine LearningMay-26-2024

The conformal prediction methodology has recently been generalized to the covariate shift setting, namely, the covariate distribution changes between the training and test data. In this paper, we study the training-conditional coverage properties of a range of conformal prediction methods under covariate shift via a weighted version of the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality tailored for distribution change. The result for the split conformal method is almost assumption-free, while the results for the full conformal and jackknife+ methods rely on strong assumptions including the uniform stability of the training algorithm.

artificial intelligence, assumption 1, machine learning, (16 more...)

arXiv.org Machine Learning

2405.16594

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Training-Conditional Coverage Bounds for Uniformly Stable Learning Algorithms

Pournaderi, Mehrdad, Xiang, Yu

arXiv.org Machine LearningApr-21-2024

The training-conditional coverage performance of the conformal prediction is known to be empirically sound. Recently, there have been efforts to support this observation with theoretical guarantees. The training-conditional coverage bounds for jackknife+ and full-conformal prediction regions have been established via the notion of $(m,n)$-stability by Liang and Barber~[2023]. Although this notion is weaker than uniform stability, it is not clear how to evaluate it for practical models. In this paper, we study the training-conditional coverage bounds of full-conformal, jackknife+, and CV+ prediction regions from a uniform stability perspective which is known to hold for empirical risk minimization over reproducing kernel Hilbert spaces with convex regularization. We derive coverage bounds for finite-dimensional models by a concentration argument for the (estimated) predictor function, and compare the bounds with existing ones under ridge regression.

artificial intelligence, assumption 1, machine learning, (15 more...)

arXiv.org Machine Learning

2404.13731

Country: North America > United States > Utah (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.35)

Add feedback

Deep Dependency Networks and Advanced Inference Schemes for Multi-Label Classification

Arya, Shivvrat, Xiang, Yu, Gogate, Vibhav

arXiv.org Machine LearningApr-17-2024

We present a unified framework called deep dependency networks (DDNs) that combines dependency networks and deep learning architectures for multi-label classification, with a particular emphasis on image and video data. The primary advantage of dependency networks is their ease of training, in contrast to other probabilistic graphical models like Markov networks. In particular, when combined with deep learning architectures, they provide an intuitive, easy-to-use loss function for multi-label classification. A drawback of DDNs compared to Markov networks is their lack of advanced inference schemes, necessitating the use of Gibbs sampling. To address this challenge, we propose novel inference schemes based on local search and integer linear programming for computing the most likely assignment to the labels given observations. We evaluate our novel methods on three video datasets (Charades, TACoS, Wetlab) and three image datasets (MS-COCO, PASCAL VOC, NUS-WIDE), comparing their performance with (a) basic neural architectures and (b) neural architectures combined with Markov networks equipped with advanced inference and learning techniques. Our results demonstrate the superiority of our new DDN methods over the two competing approaches.

artificial intelligence, machine learning, optimization problem, (13 more...)

arXiv.org Machine Learning

2404.11667

Country:

North America > United States > Texas (0.14)
North America > United States > New York (0.14)
North America > United States > Missouri (0.14)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
(4 more...)

Add feedback