Not enough data to create a plot.
Try a different view from the menu above.
OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
Kassab, Christina, Morin, Sacha, Büchner, Martin, Mattamala, Matías, Gupta, Kumaraditya, Valada, Abhinav, Paull, Liam, Fallon, Maurice
3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, the evaluation of these representations is limited to closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark to evaluate 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for 23 scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we provide insights on feature precision, segmentation, and downstream capabilities. We evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. The benchmark is publicly available at: https://openlex3d.github.io/.
Domain-incremental White Blood Cell Classification with Privacy-aware Continual Learning
Kumari, Pratibha, Bozorgpour, Afshin, Reisenbüchler, Daniel, Jost, Edgar, Crysandt, Martina, Matek, Christian, Merhof, Dorit
White blood cell (WBC) classification plays a vital role in hematology for diagnosing various medical conditions. However, it faces significant challenges due to domain shifts caused by variations in sample sources (e.g., blood or bone marrow) and differing imaging conditions across hospitals. Traditional deep learning models often suffer from catastrophic forgetting in such dynamic environments, while foundation models, though generally robust, experience performance degradation when the distribution of inference data differs from that of the training data. To address these challenges, we propose a generative replay-based Continual Learning (CL) strategy designed to prevent forgetting in foundation models for WBC classification. Our method employs lightweight generators to mimic past data with a synthetic latent representation to enable privacy-preserving replay. To showcase the effectiveness, we carry out extensive experiments with a total of four datasets with different task ordering and four backbone models including ResNet50, RetCCL, CTransPath, and UNI. Experimental results demonstrate that conventional fine-tuning methods degrade performance on previously learned tasks and struggle with domain shifts. In contrast, our continual learning strategy effectively mitigates catastrophic forgetting, preserving model performance across varying domains. This work presents a practical solution for maintaining reliable WBC classification in real-world clinical settings, where data distributions frequently evolve.
Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms
V alues taken from SER Classifier notebook. Next, the model was tested on unique audio from myself, family, and friends. Surprisingly, it performed well, especially with negative emotions. For example, it correctly predicted male anger with over 90% accuracy, often distinguishing it from other emotions like male disgust, female anger, and male sadness. An interesting test involved a friend with Asperger's syndrome, who struggles with recognizing emotions. While the model's accuracy seemed initially low, further analysis revealed that her own perception of emotions was misaligned with the model's predictions, which were actually more accurate. Finally, the model was tested on German and Swiss German audio, where it performed well in predicting anger, sadness, and disgust. However, it made some errors with positive emotions. In all cases of failure, the target emotion remained within the top 5 predicted classes, demonstrating the model's robustness.
Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving
Xie, Yusen, Huang, Zhengmin, Shen, Shaojie, Ma, Jun
In this paper, we introduce Semi-SD, a novel metric depth estimation framework tailored for surrounding cameras equipment in autonomous driving. In this work, the input data consists of adjacent surrounding frames and camera parameters. We propose a unified spatial-temporal-semantic fusion module to construct the visual fused features. Cross-attention components for surrounding cameras and adjacent frames are utilized to focus on metric scale information refinement and temporal feature matching. Building on this, we propose a pose estimation framework using surrounding cameras, their corresponding estimated depths, and extrinsic parameters, which effectively address the scale ambiguity in multi-camera setups. Moreover, semantic world model and monocular depth estimation world model are integrated to supervised the depth estimation, which improve the quality of depth estimation. We evaluate our algorithm on DDAD and nuScenes datasets, and the results demonstrate that our method achieves state-of-the-art performance in terms of surrounding camera based depth estimation quality. The source code will be available on https://github.com/xieyuser/Semi-SD.
A Systematic Review of EEG-based Machine Intelligence Algorithms for Depression Diagnosis, and Monitoring
Nassibi, Amir, Papavassiliou, Christos, Rakhmatulin, Ildar, Mandic, Danilo, Atashzar, S. Farokh
Depression disorder is a serious health condition that has affected the lives of millions of people around the world. Diagnosis of depression is a challenging practice that relies heavily on subjective studies and, in most cases, suffers from late findings. Electroencephalography (EEG) biomarkers have been suggested and investigated in recent years as a potential transformative objective practice. In this article, for the first time, a detailed systematic review of EEG-based depression diagnosis approaches is conducted using advanced machine learning techniques and statistical analyses. For this, 938 potentially relevant articles (since 1985) were initially detected and filtered into 139 relevant articles based on the review scheme 'preferred reporting items for systematic reviews and meta-analyses (PRISMA).' This article compares and discusses the selected articles and categorizes them according to the type of machine learning techniques and statistical analyses. Algorithms, preprocessing techniques, extracted features, and data acquisition systems are discussed and summarized. This review paper explains the existing challenges of the current algorithms and sheds light on the future direction of the field. This systematic review outlines the issues and challenges in machine intelligence for the diagnosis of EEG depression that can be addressed in future studies and possibly in future wearable technologies.
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
Nam, Hyeongjin, Kim, Donghwan, Oh, Jeongtaek, Lee, Kyoung Mu
Most existing methods of 3D clothed human reconstruction from a single image treat the clothed human as a single object without distinguishing between cloth and human body. In this regard, we present DeClotH, which separately reconstructs 3D cloth and human body from a single image. This task remains largely unexplored due to the extreme occlusion between cloth and the human body, making it challenging to infer accurate geometries and textures. Moreover, while recent 3D human reconstruction methods have achieved impressive results using text-to-image diffusion models, directly applying such an approach to this problem often leads to incorrect guidance, particularly in reconstructing 3D cloth. To address these challenges, we propose two core designs in our framework. First, to alleviate the occlusion issue, we leverage 3D template models of cloth and human body as regularizations, which provide strong geometric priors to prevent erroneous reconstruction by the occlusion. Second, we introduce a cloth diffusion model specifically designed to provide contextual information about cloth appearance, thereby enhancing the reconstruction of 3D cloth. Qualitative and quantitative experiments demonstrate that our proposed approach is highly effective in reconstructing both 3D cloth and the human body. More qualitative results are provided at https://hygenie1228.github.io/DeClotH/.
Social Network User Profiling for Anomaly Detection Based on Graph Neural Networks
This study proposes a risk pricing anomaly detection method for social network user portraits based on graph neural networks (GNNs), aiming to improve the ability to identify abnormal users in social network environments. In view of the limitations of traditional methods in social network data modeling, this paper combines graph autoencoders (GAEs) and graph attention networks (GATs) to achieve accurate detection of abnormal users through dynamic aggregation of neighbor features and reconstruction error evaluation. The Facebook Page-Page Network dataset is used in the experiment and compared with VAE, GNN, Transformer and GAE. The results show that the proposed method achieves the best performance in AUC, F1-score, Precision and Recall, verifying its effectiveness. In addition, this paper explores the computational efficiency of the model in large-scale data and looks forward to combining self-supervised learning, federated learning, and other technologies in the future to improve the robustness and privacy protection of risk assessment. The research results can provide efficient anomaly detection solutions for financial risk control, social security management, and other fields.
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Kulkarni, Akshay, Yan, Ge, Sun, Chung-En, Oikarinen, Tuomas, Weng, Tsui-Wei
Concept bottleneck models (CBM) aim to produce inherently interpretable models that rely on human-understandable concepts for their predictions. However, existing approaches to design interpretable generative models based on CBMs are not yet efficient and scalable, as they require expensive generative model training from scratch as well as real images with labor-intensive concept supervision. To address these challenges, we present two novel and low-cost methods to build interpretable generative models through post-hoc techniques and we name our approaches: concept-bottleneck autoencoder (CB-AE) and concept controller (CC). Our proposed approaches enable efficient and scalable training without the need of real data and require only minimal to no concept supervision. Additionally, our methods generalize across modern generative model families including generative adversarial networks and diffusion models. We demonstrate the superior interpretability and steerability of our methods on numerous standard datasets like CelebA, CelebA-HQ, and CUB with large improvements (average ~25%) over the prior work, while being 4-15x faster to train. Finally, a large-scale user study is performed to validate the interpretability and steerability of our methods.
Quality-focused Active Adversarial Policy for Safe Grasping in Human-Robot Interaction
Li, Chenghao, Beuran, Razvan, Chong, Nak Young
Vision-guided robot grasping methods based on Deep Neural Networks (DNNs) have achieved remarkable success in handling unknown objects, attributable to their powerful generalizability. However, these methods with this generalizability tend to recognize the human hand and its adjacent objects as graspable targets, compromising safety during Human-Robot Interaction (HRI). In this work, we propose the Quality-focused Active Adversarial Policy (QFAAP) to solve this problem. Specifically, the first part is the Adversarial Quality Patch (AQP), wherein we design the adversarial quality patch loss and leverage the grasp dataset to optimize a patch with high quality scores. Next, we construct the Projected Quality Gradient Descent (PQGD) and integrate it with the AQP, which contains only the hand region within each real-time frame, endowing the AQP with fast adaptability to the human hand shape. Through AQP and PQGD, the hand can be actively adversarial with the surrounding objects, lowering their quality scores. Therefore, further setting the quality score of the hand to zero will reduce the grasping priority of both the hand and its adjacent objects, enabling the robot to grasp other objects away from the hand without emergency stops. We conduct extensive experiments on the benchmark datasets and a cobot, showing the effectiveness of QFAAP. Our code and demo videos are available here: https://github.com/clee-jaist/QFAAP.
A novel forecasting framework combining virtual samples and enhanced Transformer models for tourism demand forecasting
Diao, Tingting, Wu, Xinzhang, Yang, Lina, Xiao, Ling, Dong, Yunxuan
Accurate tourism demand forecasting is hindered by limited historical data and complex spatiotemporal dependencies among tourist origins. A novel forecasting framework integrating virtual sample generation and a novel Transformer predictor addresses constraints arising from restricted data availability. A spatiotemporal GAN produces realistic virtual samples by dynamically modeling spatial correlations through a graph convolutional network, and an enhanced Transformer captures local patterns with causal convolutions and long-term dependencies with self-attention,eliminating autoregressive decoding. A joint training strategy refines virtual sample generation based on predictor feedback to maintain robust performance under data-scarce conditions. Experimental evaluations on real-world daily and monthly tourism demand datasets indicate a reduction in average MASE by 18.37% compared to conventional Transformer-based models, demonstrating improved forecasting accuracy. The integration of adaptive spatiotemporal sample augmentation with a specialized Transformer can effectively address limited-data forecasting scenarios in tourism management.