Chen, Fei
ShuffleGate: An Efficient and Self-Polarizing Feature Selection Method for Large-Scale Deep Models in Industry
Huang, Yihong, Chu, Chen, Zhang, Fan, Chen, Fei, Lin, Yu, Li, Ruiduan, Li, Zhihao
Deep models in industrial applications rely on thousands of features for accurate predictions, such as deep recommendation systems. While new features are introduced to capture evolving user behavior, outdated or redundant features often remain, significantly increasing storage and computational costs. To address this issue, feature selection methods are widely adopted to identify and remove less important features. However, existing approaches face two major challenges: (1) they often require complex hyperparameter (Hp) tuning, making them difficult to employ in practice, and (2) they fail to produce well-separated feature importance scores, which complicates straightforward feature removal. Moreover, the impact of removing unimportant features can only be evaluated through retraining the model, a time-consuming and resource-intensive process that severely hinders efficient feature selection. To solve these challenges, we propose a novel feature selection approach, ShuffleGate. In particular, it shuffles all feature values across instances simultaneously and uses a gating mechanism that allows the model to dynamically learn the weights for combining the original and shuffled inputs. Notably, it can generate well-separated feature importance scores and estimate the performance without retraining the model, while introducing only a single Hp. Experiments on four public datasets show that our approach outperforms state-of-the-art methods in feature selection for model retraining. Moreover, it has been successfully integrated into the daily iteration of Bilibili's search models across various scenarios, where it significantly reduces feature set size (up to 60%+) and computational resource usage (up to 20%+), while maintaining comparable performance.
Muscle Activation Estimation by Optimizing the Musculoskeletal Model for Personalized Strength and Conditioning Training
Wu, Xi, Li, Chenzui, Zou, Kehan, Xi, Ning, Chen, Fei
Musculoskeletal models are pivotal in the domains of rehabilitation and resistance training to analyze muscle conditions. However, individual variability in musculoskeletal parameters and the immeasurability of some internal biomechanical variables pose significant obstacles to accurate personalized modelling. Furthermore, muscle activation estimation can be challenging due to the inherent redundancy of the musculoskeletal system, where multiple muscles drive a single joint. This study develops a whole-body musculoskeletal model for strength and conditioning training and calibrates relevant muscle parameters with an electromyography-based optimization method. By utilizing the personalized musculoskeletal model, muscle activation can be subsequently estimated to analyze the performance of exercises. Bench press and deadlift are chosen for experimental verification to affirm the efficacy of this approach.
Human-Like Robot Impedance Regulation Skill Learning from Human-Human Demonstrations
Li, Chenzui, Wu, Xi, Liu, Junjia, Teng, Tao, Chen, Yiming, Calinon, Sylvain, Caldwell, Darwin, Chen, Fei
--Humans are experts in collaborating with others physically by regulating compliance behaviors based on the perception of their partners' states and the task requirements. Enabling robots to develop proficiency in human collaboration skills can facilitate more efficient human-robot collaboration (HRC). This paper introduces an innovative impedance regulation skill learning framework for achieving HRC in multiple physical collaborative tasks. The framework is designed to adjust the robot compliance to the human partner's states while adhering to reference trajectories provided by human-human demonstrations. Specifically, electromyography (EMG) signals from human muscles are collected and analyzed to extract limb impedance, representing compliance behaviors during demonstrations. Human endpoint motions are captured and represented using a probabilistic learning method to create reference trajectories and corresponding impedance profiles. Meanwhile, an LSTM-based module is implemented to develop task-oriented impedance regulation policies by mapping the muscle synergistic contributions between two demonstrators. Finally, we propose a whole-body impedance controller for a human-like robot, coordinating joint outputs to achieve the desired impedance and reference trajectory during task execution. Experimental validation was conducted through a collaborative transportation task and two interactive T ai Chi pushing hands tasks, demonstrating superior performance from the perspective of interactive forces compared to a constant impedance control method. OLLABORA TIVE robots (cobots) have emerged as a solution for more efficient human-robot collaboration (HRC) in both industrial and domestic scenarios. Co-manipulation outperforms fully robotic manipulation by offering enhanced flexibility and effectiveness while surpasses fully human manipulation by reducing labor costs, maintaining concentration, and minimizing errors due to fatigue [1]. This work was supported in part by the Research Grants Council of the Hong Kong SAR under Grant 24209021, 14222722, 14211723 and C7100-22GF and in part by InnoHK of the Government of Hong Kong via the Hong Kong Centre for Logistics Robotics. Darwin Caldwell is with the Department of Advanced Robotics, Istituto Italiano di Tecnologia, 16163 Genoa, Italy (e-mail: darwin.caldwell@iit.it).
LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments
Yue, Haosong, Xu, Qingyuan, Chen, Fei, Pan, Jia, Chen, Weihai
The Iterative Closest Point (ICP) algorithm is a crucial component of LiDAR-based SLAM algorithms. However, its performance can be negatively affected in unstructured environments that lack features and geometric structures, leading to low accuracy and poor robustness in localization and mapping. It is known that degeneracy caused by the lack of geometric constraints can lead to errors in 6-DOF pose estimation along ill-conditioned directions. Therefore, there is a need for a broader and more fine-grained degeneracy detection and handling method. This paper proposes a new point cloud registration framework, LP-ICP, that combines point-to-line and point-to-plane distance metrics in the ICP algorithm, with localizability detection and handling. LP-ICP consists of a localizability detection module and an optimization module. The localizability detection module performs localizability analysis by utilizing the correspondences between edge points (with low local smoothness) to lines and planar points (with high local smoothness) to planes between the scan and the map. The localizability contribution of individual correspondence constraints can be applied to a broader range. The optimization module adds additional soft and hard constraints to the optimization equations based on the localizability category. This allows the pose to be constrained along ill-conditioned directions, with updates either tending towards the constraint value or leaving the initial estimate unchanged. This improves accuracy and reduces fluctuations. The proposed method is extensively evaluated through experiments on both simulation and real-world datasets, demonstrating higher or comparable accuracy than the state-of-the-art methods. The dataset and code of this paper will also be open-sourced at https://github.com/xuqingyuan2000/LP-ICP.
Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
Zhang, Yuanming, Lu, Jing, Chen, Fei, Du, Haoliang, Gao, Xia, Lin, Zhibin
Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, it is found that on the recently presented dataset with 14-class directional focus, models relying exclusively on EEG inputs exhibit significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. The CNN, LSM-CNN, and Deformer models are employed to decode the directional focus from listeners' EEG signals and audio spatial spectra. The proposed Sp-EEG-Deformer model achieves notable 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios with a decision window of 1 second, respectively. Experiment results indicate increased decoding accuracy as the number of alternative directions reduces. These findings suggest the efficacy of our proposed dual modal directional focus decoding strategy.
Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration
Liu, Junjia, Li, Zhuo, Yu, Minghao, Dong, Zhipeng, Calinon, Sylvain, Caldwell, Darwin, Chen, Fei
Humanoid robots are envisioned as embodied intelligent agents capable of performing a wide range of human-level loco-manipulation tasks, particularly in scenarios requiring strenuous and repetitive labor. However, learning these skills is challenging due to the high degrees of freedom of humanoid robots, and collecting sufficient training data for humanoid is a laborious process. Given the rapid introduction of new humanoid platforms, a cross-embodiment framework that allows generalizable skill transfer is becoming increasingly critical. To address this, we propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype and bypassing the need for re-training on every new robot platform. The model learns behavior primitives from human demonstrations through adversarial imitation, and the complex robot structures are decomposed into functional components, each trained independently and dynamically coordinated. Task generalization is achieved through a human-object interaction graph, and skills are transferred to different robots via embodiment-specific kinematic motion retargeting and dynamic fine-tuning. Our framework is validated on five humanoid robots with diverse configurations, demonstrating stable loco-manipulation and highlighting its effectiveness in reducing data requirements and increasing the efficiency of skill transfer across platforms.
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Zezario, Ryandhimas E., Fu, Szu-Wei, Chen, Fei, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu
Abstract--In this study, we propose a cross-domain multiobjective 2.478 in unseen noise environments) over a CNN-based baseline speech assessment model called MOSA-Net, which SE model. Index Terms--non-intrusive speech assessment models, deep More specifically, MOSA-Net is designed to estimate the speech learning, multi-objective learning, speech enhancement. PEECH assessment metrics are indicators that quantitatively measure the specific attributes of speech signals. LCC by 0.021 (0.985 vs 0.964 in seen noise environments) For example, QIA-SE can improve PESQ by 0.301 Ryandhimas E. Zezario is with the Department of Computer Science and Fei Chen is with the Department of Electrical and Electronic Engineering, Southern University of Science and Technology of China, Shenzhen, China. Hsin-Min Wang is with the Institute of Information Science, Academia Sinica, Taipei, Taiwan. This testing strategy is prohibitive To attain a higher assessment accuracy, the MBNet adopts the and may not always be feasible. Hence, several objective BiasNet architecture to compensate for the biased scores of a evaluations metrics have been developed as surrogates for certain judge [49], In addition, the multi-task learning criterion human listening tests [6]-[31]. Meanwhile, different acoustic comprises two stages. The first stage includes a series of features are used as input to the assessment model to consider signal processing units designed to convert speech waveforms information from different acoustic domains [51], [52].
GARField: Addressing the visual Sim-to-Real gap in garment manipulation with mesh-attached radiance fields
Delehelle, Donatien, Caldwell, Darwin G., Chen, Fei
While humans intuitively manipulate garments and other textile items swiftly and accurately, it is a significant challenge for robots. A factor crucial to human performance is the ability to imagine, a priori, the intended result of the manipulation intents and hence develop predictions on the garment pose. That ability allows us to plan from highly obstructed states, adapt our plans as we collect more information and react swiftly to unforeseen circumstances. Conversely, robots struggle to establish such intuitions and form tight links between plans and observations. We can partly attribute this to the high cost of obtaining densely labelled data for textile manipulation, both in quality and quantity. The problem of data collection is a long-standing issue in data-based approaches to garment manipulation. As of today, generating high-quality and labelled garment manipulation data is mainly attempted through advanced data capture procedures that create simplified state estimations from real-world observations. However, this work proposes a novel approach to the problem by generating real-world observations from object states. To achieve this, we present GARField (Garment Attached Radiance Field), the first differentiable rendering architecture, to our knowledge, for data generation from simulated states stored as triangle meshes. Code is available on https://ddonatien.github.io/garfield-website/
Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer
Mittmann, Gesa, Laiouar-Pedari, Sara, Mehrtens, Hendrik A., Haggenmรผller, Sarah, Bucher, Tabea-Clara, Chanda, Tirtha, Gaisa, Nadine T., Wagner, Mathias, Klamminger, Gilbert Georg, Rau, Tilman T., Neppl, Christina, Compรฉrat, Eva Maria, Gocht, Andreas, Hรคmmerle, Monika, Rupp, Niels J., Westhoff, Jula, Krรผcken, Irene, Seidl, Maximillian, Schรผrch, Christian M., Bauer, Marcus, Solass, Wiebke, Tam, Yu Chun, Weber, Florian, Grobholz, Rainer, Augustyniak, Jaroslaw, Kalinski, Thomas, Hรถrner, Christian, Mertz, Kirsten D., Dรถring, Constanze, Erbersdobler, Andreas, Deubler, Gabriele, Bremmer, Felix, Sommer, Ulrich, Brodhun, Michael, Griffin, Jon, Lenon, Maria Sarah L., Trpkov, Kiril, Cheng, Liang, Chen, Fei, Levi, Angelique, Cai, Guoping, Nguyen, Tri Q., Amin, Ali, Cimadamore, Alessia, Shabaik, Ahmed, Manucha, Varsha, Ahmad, Nazeel, Messias, Nidia, Sanguedolce, Francesca, Taheri, Diana, Baraban, Ezra, Jia, Liwei, Shah, Rajal B., Siadat, Farshid, Swarbrick, Nicole, Park, Kyung, Hassan, Oudai, Sakhaie, Siamak, Downes, Michelle R., Miyamoto, Hiroshi, Williamson, Sean R., Holland-Letz, Tim, Schneider, Carolin V., Kather, Jakob Nikolas, Tolkach, Yuri, Brinker, Titus J.
The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue, we introduce a novel dataset of 1,015 tissue microarray core images, annotated by an international group of 54 pathologists. The annotations provide detailed localized pattern descriptions for Gleason grading in line with international guidelines. Utilizing this dataset, we develop an inherently explainable AI system based on a U-Net architecture that provides predictions leveraging pathologists' terminology. This approach circumvents post-hoc explainability methods while maintaining or exceeding the performance of methods trained directly for Gleason pattern segmentation (Dice score: 0.713 $\pm$ 0.003 trained on explanations vs. 0.691 $\pm$ 0.010 trained on Gleason patterns). By employing soft labels during training, we capture the intrinsic uncertainty in the data, yielding strong results in Gleason pattern segmentation even in the context of high interobserver variability. With the release of this dataset, we aim to encourage further research into segmentation in medical tasks with high levels of subjectivity and to advance the understanding of pathologists' reasoning processes.
GyroCopter: Differential Bearing Measuring Trajectory Planner for Tracking and Localizing Radio Frequency Sources
Chen, Fei, Rezatofighi, S. Hamid, Ranasinghe, Damith C.
Autonomous aerial vehicles can provide efficient and effective solutions for radio frequency (RF) source tracking and localizing problems with applications ranging from wildlife conservation to search and rescue operations. Existing lightweight, low-cost, bearing measurements-based methods with a single antenna-receiver sensor system configurations necessitate in situ rotations, leading to substantial measurement acquisition times restricting searchable areas and number of measurements. We propose a GyroCopter for the task. Our approach plans the trajectory of a multi-rotor unmanned aerial vehicle (UAV) whilst utilizing UAV flight dynamics to execute a constant gyration motion to derive "pseudo-bearing" measurements to track RF sources. The gyration-based pseudo-bearing approach: i) significantly reduces the limitations associated with in situ rotation bearing; while ii) capitalizing on the simplicity, affordability, and lightweight nature of signal strength measurement acquisition hardware to estimate bearings. This method distinguishes itself from other pseudo-bearing approaches by eliminating the need for additional hardware to maintain simplicity, lightweightness and cost-effectiveness. To validate our approach, we derived the optimal rotation speed and conducted extensive simulations and field missions with our GyroCopter to track and localize multiple RF sources. The results confirm the effectiveness of our method, highlighting its potential as a practical and rapid solution for RF source localization tasks.