Goto

Collaborating Authors

 Chang, Hyung Jin


Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics

arXiv.org Artificial Intelligence

With the availability of egocentric 3D hand-object interaction datasets, there is increasing interest in developing unified models for hand-object pose estimation and action recognition. However, existing methods still struggle to recognise seen actions on unseen objects due to the limitations in representing object shape and movement using 3D bounding boxes. Additionally, the reliance on object templates at test time limits their generalisability to unseen objects. To address these challenges, we propose to leverage superquadrics as an alternative 3D object representation to bounding boxes and demonstrate their effectiveness on both template-free object reconstruction and action recognition tasks. Moreover, as we find that pure appearance-based methods can outperform the unified methods, the potential benefits from 3D geometric information remain unclear. Therefore, we study the compositionality of actions by considering a more challenging task where the training combinations of verbs and nouns do not overlap with the testing split. We extend H2O and FPHA datasets with compositional splits and design a novel collaborative learning framework that can explicitly reason about the geometric relations between hands and the manipulated object. Through extensive quantitative and qualitative evaluations, we demonstrate significant improvements over the state-of-the-arts in (compositional) action recognition.


Switching Temporary Teachers for Semi-Supervised Semantic Segmentation

arXiv.org Artificial Intelligence

The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when training with more complicated labels such as segmentation masks but with few annotated data. This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student. The temporary teachers work in shifts and are progressively improved, so consistently prevent the teacher and student from becoming excessively close. Specifically, the temporary teachers periodically take turns generating pseudo-labels to train a student model and maintain the distinct characteristics of the student model for each epoch. Consequently, Dual Teacher achieves competitive performance on the PASCAL VOC, Cityscapes, and ADE20K benchmarks with remarkably shorter training times than state-of-the-art methods. Moreover, we demonstrate that our approach is model-agnostic and compatible with both CNN- and Transformer-based models. Code is available at \url{https://github.com/naver-ai/dual-teacher}.


Clothes Grasping and Unfolding Based on RGB-D Semantic Segmentation

arXiv.org Artificial Intelligence

Clothes grasping and unfolding is a core step in robotic-assisted dressing. Most existing works leverage depth images of clothes to train a deep learning-based model to recognize suitable grasping points. These methods often utilize physics engines to synthesize depth images to reduce the cost of real labeled data collection. However, the natural domain gap between synthetic and real images often leads to poor performance of these methods on real data. Furthermore, these approaches often struggle in scenarios where grasping points are occluded by the clothing item itself. To address the above challenges, we propose a novel Bi-directional Fractal Cross Fusion Network (BiFCNet) for semantic segmentation, enabling recognition of graspable regions in order to provide more possibilities for grasping. Instead of using depth images only, we also utilize RGB images with rich color features as input to our network in which the Fractal Cross Fusion (FCF) module fuses RGB and depth data by considering global complex features based on fractal geometry. To reduce the cost of real data collection, we further propose a data augmentation method based on an adversarial strategy, in which the color and geometric transformations simultaneously process RGB and depth data while maintaining the label correspondence. Finally, we present a pipeline for clothes grasping and unfolding from the perspective of semantic segmentation, through the addition of a strategy for grasp point selection from segmentation regions based on clothing flatness measures, while taking into account the grasping direction. We evaluate our BiFCNet on the public dataset NYUDv2 and obtained comparable performance to current state-of-the-art models. We also deploy our model on a Baxter robot, running extensive grasping and unfolding experiments as part of our ablation studies, achieving an 84% success rate.


Class-Attentive Diffusion Network for Semi-Supervised Classification

arXiv.org Machine Learning

We propose Aggregation with Class-Attentive Diffusion (AggCAD), a novel aggregation scheme for semi-supervised classification on graphs, which enables the model to embed more favorable node representations for better class separation. To this end, we propose a novel Class-Attentive Diffusion (CAD) which strengthens attention to intra-class nodes and attenuates attention to inter-class nodes. In contrast to the existing diffusion methods with a transition matrix determined solely by the graph structure, CAD considers both the node features and the graph structure with the design of the class-attentive transition matrix which utilizes the classifier. In addition, we further propose an adaptive scheme for AggCAD that leverages different reflection ratios of the diffusion result for each node depending on the local class-context. As the main advantage, AggCAD alleviates the problem of undesired mixing of inter-class features caused by discrepancies between node labels and the graph structure. Built on AggCAD, we construct Class-Attentive Diffusion Network for semi-supervised classification. Comprehensive experiments demonstrate the validity of AggCAD and the results show that the proposed method significantly outperforms the state-of-the-art methods on three benchmark datasets.


Implications of Human Irrationality for Reinforcement Learning

arXiv.org Artificial Intelligence

Recent work in the behavioural sciences has begun to overturn the long-held belief that human decision making is irrational, suboptimal and subject to biases. This turn to the rational suggests that human decision making may be a better source of ideas for constraining how machine learning problems are defined than would otherwise be the case. One promising idea concerns human decision making that is dependent on apparently irrelevant aspects of the choice context. Previous work has shown that by taking into account choice context and making relational observations, people can maximize expected value. Other work has shown that Partially observable Markov decision processes (POMDPs) are a useful way to formulate human-like decision problems. Here, we propose a novel POMDP model for contextual choice tasks and show that, despite the apparent irrationalities, a reinforcement learner can take advantage of the way that humans make decisions. We suggest that human irrationalities may offer a productive source of inspiration for improving the design of AI architectures and machine learning methods.


Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning

arXiv.org Machine Learning

In contrast to the existing graph au-toencoders with asymmetric decoder parts, the proposed autoencoder has a newly designed decoder which builds a completely symmetric autoencoder form. F or the reconstruction of node features, the decoder is designed based on Laplacian sharpening as the counterpart of Laplacian smoothing of the encoder, which allows utilizing the graph structure in the whole processes of the proposed autoen-coder architecture. In order to prevent the numerical instability of the network caused by the Laplacian sharpening introduction, we further propose a new numerically stable form of the Laplacian sharpening by incorporating the signed graphs. In addition, a new cost function which finds a latent representation and a latent affinity matrix simultaneously is devised to boost the performance of image clustering tasks. The experimental results on clustering, link prediction and visualization tasks strongly support that the proposed model is stable and outperforms various state-of-the-art algorithms. 1. Introduction A graph, which consists of a set of nodes and edges, is a powerful tool to seek the geometric structure of data. There are various applications using graphs in the machine learning and data mining fields such as node clustering [26], dimensionality reduction [1], social network analysis [15], chemical property prediction of a molecular graph [7], and image segmentation [30]. However, conventional methods for analyzing a graph have several problems such as low computational efficiency due to eigendecomposition or singular value decomposition, or only showing a shallow relationship between nodes. In recent years, an emerging field called geometric deep learning [2], generalizes deep neural network models to (a) VGAE [13] (b) MGAE [35] (c) Proposed autoencoder Figure 1: Architectures of existing graph convolutional au-toencoders and proposed one. A, X, H and W denote the affinity matrix (structure of graph), node attributes, latent representations and the learnable weight of network respectively.


DAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express Knowledge About the World and the Self

arXiv.org Artificial Intelligence

This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.


MAGAN: Margin Adaptation for Generative Adversarial Networks

arXiv.org Machine Learning

We propose the Margin Adaptation for Generative Adversarial Networks (MAGANs) algorithm, a novel training procedure for GANs to improve stability and performance by using an adaptive hinge loss function. We estimate the appropriate hinge loss margin with the expected energy of the target distribution, and derive principled criteria for when to update the margin. We prove that our method converges to its global optimum under certain assumptions. Evaluated on the task of unsupervised image generation, the proposed training procedure is simple yet robust on a diverse set of data, and achieves qualitative and quantitative improvements compared to the state-of-the-art.