Goto

Collaborating Authors

 graduate school


Preliminary Prototyping of Avoidance Behaviors Triggered by a User's Physical Approach to a Robot

Yonezawa, Tomoko, Yamazoe, Hirotake, Fujino, Atsuo, Suhara, Daigo, Tamamoto, Takaya, Nishiguchi, Yuto

arXiv.org Artificial Intelligence

Human-robot interaction frequently involves physical proximity or contact. In human-human settings, people flexibly accept, reject, or tolerate such approaches depending on the relationship and context. We explore the design of a robot's rejective internal state and corresponding avoidance behaviors, such as withdrawing or pushing away, when a person approaches. We model the accumulation and decay of discomfort as a function of interpersonal distance, and implement tolerance (endurance) and limit-exceeding avoidance driven by the Dominance axis of the PAD affect model. The behaviors and their intensities are realized on an arm robot. Results illustrate a coherent pipeline from internal state parameters to graded endurance motions and, once a limit is crossed, to avoidance actions.


Multi-View Graph Convolution Network for Internal Talent Recommendation Based on Enterprise Emails

Kim, Soo Hyun, Kim, Jang-Hyun

arXiv.org Artificial Intelligence

Internal talent recommendation is a critical strategy for organizational continuity, yet conventional approaches suffer from structural limitations, often overlooking qualified candidates by relying on the narrow perspective of a few managers. To address this challenge, we propose a novel framework that models two distinct dimensions of an employee's position fit from email data: WHAT they do (semantic similarity of tasks) and HOW they work (structural characteristics of their interactions and collaborations). These dimensions are represented as independent graphs and adaptively fused using a Dual Graph Convolutional Network (GCN) with a gating mechanism. Experiments show that our proposed gating-based fusion model significantly outperforms other fusion strategies and a heuristic baseline, achieving a top performance of 40.9% on Hit@100. Importantly, it is worth noting that the model demonstrates high interpretability by learning distinct, context-aware fusion strategies for different job families. For example, it learned to prioritize relational (HOW) data for 'sales and marketing' job families while applying a balanced approach for 'research' job families. This research offers a quantitative and comprehensive framework for internal talent discovery, minimizing the risk of candidate omission inherent in traditional methods. Its primary contribution lies in its ability to empirically determine the optimal fusion ratio between task alignment (WHAT) and collaborative patterns (HOW), which is required for employees to succeed in the new positions, thereby offering important practical implications.


Efficient On-Chip Implementation of 4D Radar-Based 3D Object Detection on Hailo-8L

Byun, Woong-Chan, Paek, Dong-Hee, Song, Seung-Hyun, Kong, Seung-Hyun

arXiv.org Artificial Intelligence

4D radar has attracted attention in autonomous driving due to its ability to enable robust 3D object detection even under adverse weather conditions. To practically deploy such technologies, it is essential to achieve real-time processing within low-power embedded environments. Addressing this, we present the first on-chip implementation of a 4D radar-based 3D object detection model on the Hailo-8L AI accelerator. Although conventional 3D convolutional neural network (CNN) architectures require 5D inputs, the Hailo-8L only supports 4D tensors, posing a significant challenge. To overcome this limitation, we introduce a tensor transformation method that reshapes 5D inputs into 4D formats during the compilation process, enabling direct deployment without altering the model structure. The proposed system achieves 46.47% AP_3D and 52.75% AP_BEV, maintaining comparable accuracy to GPU-based models while achieving an inference speed of 13.76 Hz. These results demonstrate the applicability of 4D radar-based perception technologies to autonomous driving systems.


Curiosity-Driven Reinforcement Learning from Human Feedback

Sun, Haoran, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Wu, Hua, Wang, Haifeng

arXiv.org Artificial Intelligence

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences, but often at the cost of reduced output diversity. This trade-off between diversity and alignment quality remains a significant challenge. Drawing inspiration from curiosity-driven exploration in reinforcement learning, we introduce curiosity-driven RLHF (CD-RLHF), a framework that incorporates intrinsic rewards for novel states, alongside traditional sparse extrinsic rewards, to optimize both output diversity and alignment quality. We demonstrate the effectiveness of CD-RLHF through extensive experiments on a range of tasks, including text summarization and instruction following. Our approach achieves significant gains in diversity on multiple diversity-oriented metrics while maintaining alignment with human preferences comparable to standard RLHF. We make our code publicly available at https://github.com/ernie-research/CD-RLHF.


Reflexive Input-Output Causality Mechanisms

Kayawake, Ryotaro, Miida, Haruto, Sano, Shunsuke, Onda, Issei, Abe, Kazuki, Watanabe, Masahiro, Galipon, Josephine, Tadakuma, Riichiro, Tadakuma, Kenjiro

arXiv.org Artificial Intelligence

This paper explores the concept of reflexive actuation, examining how robots may leverage both internal and external stimuli to trigger changes in the motion, performance, or physical characteristics of the robot, such as its size, shape, or configuration, and so on. These changes themselves may in turn be sequentially re-used as input to drive further adaptations. Drawing inspiration from biological systems, where reflexes are an essential component of the response to environmental changes, reflexive actuation is critical to enable robots to adapt to diverse situations and perform complex tasks. The underlying principles of reflexive actuation are analyzed, with examples provided from existing implementations such as contact-sensitive reflexive arms, physical counters, and their applications. The paper also outlines future directions and challenges for advancing this research area, emphasizing its significance in the development of adaptive, responsive robotic systems.


HTML-LSTM: Information Extraction from HTML Tables in Web Pages using Tree-Structured LSTM

Kawamura, Kazuki, Yamamoto, Akihiro

arXiv.org Artificial Intelligence

In this paper, we propose a novel method for extracting information from HTML tables with similar contents but with a different structure. We aim to integrate multiple HTML tables into a single table for retrieval of information containing in various Web pages. The method is designed by extending tree-structured LSTM, the neural network for tree-structured data, in order to extract information that is both linguistic and structural information of HTML data. We evaluate the proposed method through experiments using real data published on the WWW.


Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric Marco Cuturi Graduate School of Informatics Graduate School of Informatics Kyoto University

Neural Information Processing Systems

Given a family of probability measures in P (X), the space of probability measures on a Hilbert space X, our goal in this paper is to highlight one ore more curves in P (X) that summarize efficiently that family. We propose to study this problem under the optimal transport (Wasserstein) geometry, using curves that are restricted to be geodesic segments under that metric. We show that concepts that play a key role in Euclidean PCA, such as data centering or orthogonality of principal directions, find a natural equivalent in the optimal transport geometry, using Wasserstein means and differential geometry. The implementation of these ideas is, however, computationally challenging. To achieve scalable algorithms that can handle thousands of measures, we propose to use a relaxed definition for geodesics and regularized optimal transport distances. The interest of our approach is demonstrated on images seen either as shapes or color histograms.


Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

Mizutani, Koyu, Mitarai, Haruki, Miyazaki, Kakeru, Kumano, Soichiro, Yamasaki, Toshihiko

arXiv.org Artificial Intelligence

Earthquakes are among the most immediate and deadly natural disasters that humans face. Accurately forecasting the extent of earthquake damage and assessing potential risks can be instrumental in saving numerous lives. In this study, we developed linear regression models capable of predicting seismic intensity distributions based on earthquake parameters: location, depth, and magnitude. Because it is completely data-driven, it can predict intensity distributions without geographical information. The dataset comprises seismic intensity data from earthquakes that occurred in the vicinity of Japan between 1997 and 2020, specifically containing 1,857 instances of earthquakes with a magnitude of 5.0 or greater, sourced from the Japan Meteorological Agency. We trained both regression and classification models and combined them to take advantage of both to create a hybrid model. The proposed model outperformed commonly used Ground Motion Prediction Equations (GMPEs) in terms of the correlation coefficient, F1 score, and MCC. Furthermore, the proposed model can predict even abnormal seismic intensity distributions, a task at conventional GMPEs often struggle.


Computations for This World and out of This World

Communications of the ACM

In many ways, my career has been chasing chances to do mathematics. When I enrolled at college, I majored in math. Later, I added a second major, chemical physics, which seemed to offer better career options than pure mathematics. When I was accepted to graduate programs for mathematics and chemical physics, I moved into chemical physics. One nice aspect of chemical physics is that it seems to be 80% physics and 20% math.


R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut

Niu, Yingjie, Ding, Ming, Ge, Maoning, Karlsson, Robin, Zhang, Yuxiao, Takeda, Kazuya

arXiv.org Artificial Intelligence

Transformer-based models have gained popularity in the field of natural language processing (NLP) and are extensively utilized in computer vision tasks and multi-modal models such as GPT4. This paper presents a novel method to enhance the explainability of Transformer-based image classification models. Our method aims to improve trust in classification results and empower users to gain a deeper understanding of the model for downstream tasks by providing visualizations of class-specific maps. We introduce two modules: the ``Relationship Weighted Out" and the ``Cut" modules. The ``Relationship Weighted Out" module focuses on extracting class-specific information from intermediate layers, enabling us to highlight relevant features. Additionally, the ``Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color. By integrating these modules, we generate dense class-specific visual explainability maps. We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset. Furthermore, we conduct a large number of experiments on the LRN dataset, specifically designed for automatic driving danger alerts, to evaluate the explainability of our method in complex backgrounds. The results demonstrate a significant improvement over previous methods. Moreover, we conduct ablation experiments to validate the effectiveness of each module. Through these experiments, we are able to confirm the respective contributions of each module, thus solidifying the overall effectiveness of our proposed approach.