Goto

Collaborating Authors

 tpe



Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation

Golbaghi, Aryan, Zhou, Shuo

arXiv.org Artificial Intelligence

We propose a workflow for speech emotion recognition (SER) that combines pre-trained representations with automated hyperparameter optimisation (HPO). Using SpeechBrain wav2vec2-base model fine-tuned on IEMOCAP as the encoder, we compare two HPO strategies, Gaussian Process Bayesian Optimisation (GP-BO) and Tree-structured Parzen Estimators (TPE), under an identical four-dimensional search space and 15-trial budget, with balanced class accuracy (BCA) on the German EmoDB corpus as the objective. All experiments run on 8 CPU cores with 32 GB RAM. GP-BO achieves 0.96 BCA in 11 minutes, and TPE (Hyperopt implementation) attains 0.97 in 15 minutes. In contrast, grid search requires 143 trials and 1,680 minutes to exceed 0.9 BCA, and the best AutoSpeech 2020 baseline reports only 0.85 in 30 minutes on GPU. For cross-lingual generalisation, an EmoDB-trained HPO-tuned model improves zero-shot accuracy by 0.25 on CREMA-D and 0.26 on RAVDESS. Results show that efficient HPO with pre-trained encoders delivers competitive SER on commodity CPUs. Source code to this work is available at: https://github.com/youngaryan/speechbrain-emotion-hpo.



Tree-Structured Parzen Estimator Can Solve Black-Box Combinatorial Optimization More Efficiently

Abe, Kenshin, Wang, Yunzhuo, Watanabe, Shuhei

arXiv.org Artificial Intelligence

Tree-structured Parzen estimator (TPE) is a versatile hyperparameter optimization (HPO) method supported by popular HPO tools. Since these HPO tools have been developed in line with the trend of deep learning (DL), the problem setups often used in the DL domain have been discussed for TPE such as multi-objective optimization and multi-fidelity optimization. However, the practical applications of HPO are not limited to DL, and black-box combinatorial optimization is actively utilized in some domains, e.g., chemistry and biology. As combinatorial optimization has been an untouched, yet very important, topic in TPE, we propose an efficient combinatorial optimization algorithm for TPE. In this paper, we first generalize the categorical kernel with the numerical kernel in TPE, enabling us to introduce a distance structure to the categorical kernel. Then we discuss modifications for the newly developed kernel to handle a large combinatorial search space. These modifications reduce the time complexity of the kernel calculation with respect to the size of a combinatorial search space. In the experiments using synthetic problems, we verified that our proposed method identifies better solutions with fewer evaluations than the original TPE. Our algorithm is available in Optuna, an open-source framework for HPO.


Periodic Online Testing for Sparse Systolic Tensor Arrays

Peltekis, Christodoulos, Nicopoulos, Chrysostomos, Dimitrakopoulos, Giorgos

arXiv.org Artificial Intelligence

Modern Machine Learning (ML) applications often benefit from structured sparsity, a technique that efficiently reduces model complexity and simplifies handling of sparse data in hardware. Sparse systolic tensor arrays - specifically designed to accelerate these structured-sparse ML models - play a pivotal role in enabling efficient computations. As ML is increasingly integrated into safety-critical systems, it is of paramount importance to ensure the reliability of these systems. This paper introduces an online error-checking technique capable of detecting and locating permanent faults within sparse systolic tensor arrays before computation begins. The new technique relies on merely four test vectors and exploits the weight values already loaded within the systolic array to comprehensively test the system. Fault-injection campaigns within the gate-level netlist, while executing three well-established Convolutional Neural Networks (CNN), validate the efficiency of the proposed approach, which is shown to achieve very high fault coverage, while incurring minimal performance and area overheads.


Pellet-based 3D Printing of Soft Thermoplastic Elastomeric Membranes for Soft Robotic Applications

Willemstein, Nick, van der Kooij, Herman, Sadeghi, Ali

arXiv.org Artificial Intelligence

Additive Manufacturing (AM) is a promising solution for handling the complexity of fabricating soft robots. However, the AM of hyperelastic materials is still challenging with limited material types. Within this work, pellet-based 3D printing of very soft thermoplastic elastomers (TPEs) was explored. Our results show that TPEs can have similar engineering stress and maximum strain as Ecoflex OO-10. These TPEs were used to 3D-print airtight thin membranes (0.2-1.2 mm), which could inflate up to a stretch of 1320\%. Combining the membrane's large expansion and softness with the 3D printing of hollow structures simplified the design of a bending actuator that can bend 180 degrees and reach a blocked force of 238 times its weight. In addition, by 3D printing TPE pellets and rigid filaments, the soft membrane could grasp objects by enveloping an object or as a sensorized sucker, which relied on the TPE's softness to conform to the object or act as a seal. In addition, the membrane of the sucker was utilized as a tactile sensor to detect an object before adhesion. These results suggest the feasibility of 3D printing soft robots by using soft TPEs and membranes as an interesting class of materials and sensorized actuators, respectively.


PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data

Yang, ChangHee, Song, Hyeonseop, Choi, Seokhun, Lee, Seungwoo, Kim, Jaechul, Do, Hoseok

arXiv.org Artificial Intelligence

Despite considerable efforts to enhance the generalization of 3D pose estimators without costly 3D annotations, existing data augmentation methods struggle in real world scenarios with diverse human appearances and complex poses. We propose PoseSyn, a novel data synthesis framework that transforms abundant in the wild 2D pose dataset into diverse 3D pose image pairs. PoseSyn comprises two key components: Error Extraction Module (EEM), which identifies challenging poses from the 2D pose datasets, and Motion Synthesis Module (MSM), which synthesizes motion sequences around the challenging poses. Then, by generating realistic 3D training data via a human animation model aligned with challenging poses and appearances PoseSyn boosts the accuracy of various 3D pose estimators by up to 14% across real world benchmarks including various backgrounds and occlusions, challenging poses, and multi view scenarios. Extensive experiments further confirm that PoseSyn is a scalable and effective approach for improving generalization without relying on expensive 3D annotations, regardless of the pose estimator's model size or design.


RETR: Multi-View Radar Detection Transformer for Indoor Perception

Yataka, Ryoma, Cardace, Adriano, Wang, Pu Perry, Boufounos, Petros, Takahashi, Ryuhei

arXiv.org Artificial Intelligence

Indoor radar perception has seen rising interest due to affordable costs driven by emerging automotive imaging radar developments and the benefits of reduced privacy concerns and reliability under hazardous conditions (e.g., fire and smoke). However, existing radar perception pipelines fail to account for distinctive characteristics of the multi-view radar setting. In this paper, we propose Radar dEtection TRansformer (RETR), an extension of the popular DETR architecture, tailored for multi-view radar perception. RETR inherits the advantages of DETR, eliminating the need for hand-crafted components for object detection and segmentation in the image plane. More importantly, RETR incorporates carefully designed modifications such as 1) depth-prioritized feature similarity via a tunable positional encoding (TPE); 2) a tri-plane loss from both radar and camera coordinates; and 3) a learnable radar-to-camera transformation via reparameterization, to account for the unique multi-view radar setting. Evaluated on two indoor radar perception datasets, our approach outperforms existing state-of-the-art methods by a margin of 15.38+ AP for object detection and 11.91+ IoU for instance segmentation, respectively.


Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control

Shianifar, Jonaid, Schukat, Michael, Mason, Karl

arXiv.org Artificial Intelligence

In this paper, we explore the optimization of hyperparameters for the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms using the Tree-structured Parzen Estimator (TPE) in the context of robotic arm control with seven Degrees of Freedom (DOF). Our results demonstrate a significant enhancement in algorithm performance, TPE improves the success rate of SAC by 10.48 percentage points and PPO by 34.28 percentage points, where models trained for 50K episodes. Furthermore, TPE enables PPO to converge to a reward within 95% of the maximum reward 76% faster than without TPE, which translates to about 40K fewer episodes of training required for optimal performance. Also, this improvement for SAC is 80% faster than without TPE. This study underscores the impact of advanced hyperparameter optimization on the efficiency and success of deep reinforcement learning algorithms in complex robotic tasks.


Large Language Models to Enhance Bayesian Optimization

Liu, Tennison, Astorga, Nicolás, Seedat, Nabeel, van der Schaar, Mihaela

arXiv.org Artificial Intelligence

Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance still remains a delicate process. At a high level, we frame the BO problem in natural language terms, enabling LLMs to iteratively propose promising solutions conditioned on historical evaluations. More specifically, we explore how combining contextual understanding, few-shot learning proficiency, and domain knowledge of LLMs can enhance various components of model-based BO. Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and improves surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse. Our approach is performed in context and does not require LLM finetuning. Additionally, it is modular by design, allowing individual components to be integrated into existing BO frameworks, or function cohesively as an end-to-end method. Expensive black-box functions are common in many disciplines and applications including robotics (11, 35), experimental design (25), drug discovery (32), interface design (8) and, in machine learning for hyperparameter tuning (6, 34, 49). Bayesian optimization (BO) is a widely adopted and efficient model-based approach for globally optimizing these functions (31, 33). BO's effectiveness lies in its ability to operate based on a limited set of observations without the need for direct access to the objective function or its gradients. Broadly, BO uses observed data to construct a surrogate model as an approximation to the objective function, and then iteratively generates potentially good points, from which the acquisition function selects the one with the highest utility. This chosen point undergoes evaluation, and the cycle continues. For BO, the name of the game is efficient search, but the efficiency of this search largely depends on the quality of the surrogate model and its capacity to quickly identify high-potential regions (16).