distance loss
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
Xue, Jialong, Gao, Wei, Wang, Yu, Ji, Chao, Zhao, Dongdong, Yan, Shi, Zhang, Shiwu
High-precision tiny object alignment remains a common and critical challenge for humanoid robots in real-world. To address this problem, this paper proposes a vision-based framework for precisely estimating and controlling the relative position between a handheld tool and a target object for humanoid robots, e.g., a screwdriver tip and a screw head slot. By fusing images from the head and torso cameras on a robot with its head joint angles, the proposed Transformer-based visual servoing method can correct the handheld tool's positional errors effectively, especially at a close distance. Experiments on M4-M8 screws demonstrate an average convergence error of 0.8-1.3 mm and a success rate of 93\%-100\%. Through comparative analysis, the results validate that this capability of high-precision tiny object alignment is enabled by the Distance Estimation Transformer architecture and the Multi-Perception-Head mechanism proposed in this paper.
Timing-Driven Global Placement by Efficient Critical Path Extraction
Shi, Yunqi, Xu, Siyuan, Kai, Shixiong, Lin, Xi, Xue, Ke, Yuan, Mingxuan, Qian, Chao
Initially, vanilla DREAMPlace [20] is run to distribute the cells within the layout. Subsequently, we perform a path-level timing analysis every m rounds to extract critical paths and update the pin-to-pin loss. This involves report_timing_endpoint(n,1), where n denotes the number of all failing endpoints, to collect data on critical paths. As we traverse these paths, each pin pair (i, j) involved is added to a maintained set P, unless it has already been included. To address the path-sharing effect, the weight w ( i,j) of each pin pair is dynamically updated as follows: w ( i,j) = null w 0, if ( i, j) / P, w (i,j) + w 1 (slack/ WNS), otherwise, (9) where w 0 and w 1 are hyperparameters, and slack indicates the negative slack of the respective critical path. The pin-to-pin attraction loss PP (x, y) of the layout is then computed as: PP (x, y) = null (i,j) P w ( i,j) Q(i, j), (10) with Q(i, j) and w (i,j) defined in Eqs. 8 and 9, respectively. After defining the loss function properly, we implement the CUDA kernel of PP loss for GPU-acceleration.
- North America > United States > California > San Francisco County > San Francisco (0.15)
- Asia > China (0.15)
- Asia > Taiwan (0.14)
- Europe (0.14)
Reviews: One-Sided Unsupervised Domain Mapping
This paper tackles the problem of unsupervised domain adaptation. The paper introduces a new constraint, which compares samples and enforces high cross-domain correlation between the matching distances computed in each domain. An alternative to pairwise distance is provided, for cases in which we only have access to one data sample at a time. In this case, the same rationale can be applied by splitting the images and comparing the distances between their left/right or up/down halves in both domains. The final unsupervised domain adaptation model is trained by combining previously introduced losses (adversarial loss and circularity loss) with the new distance loss, showing that the new constraint is effective and allows for one directional mapping.
Dexterous Grasp Transformer
Xu, Guo-Hao, Wei, Yi-Lin, Zheng, Dian, Wu, Xiao-Ming, Zheng, Wei-Shi
In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous grasping and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve grasping quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while keeping high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at https://github.com/iSEE-Laboratory/DGTR .
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Geometrically Aligned Transfer Encoder for Inductive Transfer in Regression Tasks
Ko, Sung Moon, Lee, Sumin, Jeong, Dae-Woong, Lim, Woohyung, Han, Sehui
Transfer learning is a crucial technique for handling a small amount of data that is potentially related to other abundant data. However, most of the existing methods are focused on classification tasks using images and language datasets. Therefore, in order to expand the transfer learning scheme to regression tasks, we propose a novel transfer technique based on differential geometry, namely the Geometrically Aligned Transfer Encoder (GATE). In this method, we interpret the latent vectors from the model to exist on a Riemannian curved manifold. We find a proper diffeomorphism between pairs of tasks to ensure that every arbitrary point maps to a locally flat coordinate in the overlapping region, allowing the transfer of knowledge from the source to the target data. This also serves as an effective regularizer for the model to behave in extrapolation regions. In this article, we demonstrate that GATE outperforms conventional methods and exhibits stable behavior in both the latent space and extrapolation regions for various molecular graph datasets.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Tail of Distribution GAN (TailGAN): Generative- Adversarial-Network-Based Boundary Formation
Generative Adversarial Networks (GAN) are a powerful methodology and can be used for unsupervised anomaly detection, where current techniques have limitations such as the accurate detection of anomalies near the tail of a distribution. GANs generally do not guarantee the existence of a probability density and are susceptible to mode collapse, while few GANs use likelihood to reduce mode collapse. In this paper, we create a GAN-based tail formation model for anomaly detection, the Tail of distribution GAN (TailGAN), to generate samples on the tail of the data distribution and detect anomalies near the support boundary. Using TailGAN, we leverage GANs for anomaly detection and use maximum entropy regularization. Using GANs that learn the probability of the underlying distribution has advantages in improving the anomaly detection methodology by allowing us to devise a generator for boundary samples, and use this model to characterize anomalies. TailGAN addresses supports with disjoint components and achieves competitive performance on images. We evaluate TailGAN for identifying Out-of-Distribution (OoD) data and its performance evaluated on MNIST, CIFAR-10, Baggage X-Ray, and OoD data shows competitiveness compared to methods from the literature.
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval
Chen, Xueying, Zhang, Rong, Zhan, Yibing
Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio. One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels. In this paper, we propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to deeply analyze correlations among representations. First, we propose a diversified attention feature projector by considering the interaction between different representations to generate multiple representations of an instance. Then, we design a novel graph pattern loss to explore the correlations among different representations, in this graph all possible distances between different representations are considered. In addition, a modality classifier is added to explicitly declare the corresponding modalities of features before fusion and guide the network to enhance discrimination ability. We test GPLDAN on four public datasets. Compared with the state-of-the-art cross-modal retrieval methods, the experimental results demonstrate the performance and competitiveness of GPLDAN.
- Asia > Singapore (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Anhui Province > Hefei (0.04)