Performance Analysis
Causal Discovery in Multivariate Time Series through Mutual Information Featurization
Paldino, Gian Marco, Bontempi, Gianluca
Discovering causal relationships in complex multivariate time series is a fundamental scientific challenge. Traditional methods often falter, either by relying on restrictive linear assumptions or on conditional independence tests that become uninformative in the presence of intricate, non-linear dynamics. This paper proposes a new paradigm, shifting from statistical testing to pattern recognition. We hypothesize that a causal link creates a persistent and learnable asymmetry in the flow of information through a system's temporal graph, even when clear conditional independencies are obscured. We introduce Temporal Dependency to Causality (TD2C), a supervised learning framework that operationalizes this hypothesis. TD2C learns to recognize these complex causal signatures from a rich set of information-theoretic and statistical descriptors. Trained exclusively on a diverse collection of synthetic time series, TD2C demonstrates remarkable zero-shot generalization to unseen dynamics and established, realistic benchmarks. Our results show that TD2C achieves state-of-the-art performance, consistently outperforming established methods, particularly in high-dimensional and non-linear settings. By reframing the discovery problem, our work provides a robust and scalable new tool for uncovering causal structures in complex systems.
Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing
Patrone, Paul N., Kearsley, Anthony J.
Motivated by canonical problems in medical diagnostics, we propose and study properties of an objective function that uniformly bounds uncertainties in quantities of interest extracted from classifiers and related data analysis tools. We begin by adopting a set-theoretic perspective to show how two main tasks in diagnostics -- classification and prevalence estimation -- can be recast in terms of a variation on the confusion (or error) matrix ${\boldsymbol {\rm P}}$ typically considered in supervised learning. We then combine arguments from conditional probability with the Gershgorin circle theorem to demonstrate that the largest Gershgorin radius $\boldsymbol ฯ_m$ of the matrix $\mathbb I-\boldsymbol {\rm P}$ (where $\mathbb I$ is the identity) yields uniform error bounds for both classification and prevalence estimation. In a two-class setting, $\boldsymbol ฯ_m$ is minimized via a measure-theoretic ``water-leveling'' argument that optimizes an appropriately defined partition $U$ generating the matrix ${\boldsymbol {\rm P}}$. We also consider an example that illustrates the difficulty of generalizing the binary solution to a multi-class setting and deduce relevant properties of the confusion matrix.
Understanding the Essence: Delving into Annotator Prototype Learning for Multi-Class Annotation Aggregation
Chen, Ju, Feng, Jun, Zhang, Shenyu
Multi-class classification annotations have significantly advanced AI applications, with truth inference serving as a critical technique for aggregating noisy and biased annotations. Existing state-of-the-art methods typically model each annotator's expertise using a confusion matrix. However, these methods suffer from two widely recognized issues: 1) when most annotators label only a few tasks, or when classes are imbalanced, the estimated confusion matrices are unreliable, and 2) a single confusion matrix often remains inadequate for capturing each annotator's full expertise patterns across all tasks. To address these issues, we propose a novel confusion-matrix-based method, PTBCC (ProtoType learning-driven Bayesian Classifier Combination), to introduce a reliable and richer annotator estimation by prototype learning. Specifically, we assume that there exists a set $S$ of prototype confusion matrices, which capture the inherent expertise patterns of all annotators. Rather than a single confusion matrix, the expertise per annotator is extended as a Dirichlet prior distribution over these prototypes. This prototype learning-driven mechanism circumvents the data sparsity and class imbalance issues, ensuring a richer and more flexible characterization of annotators. Extensive experiments on 11 real-world datasets demonstrate that PTBCC achieves up to a 15% accuracy improvement in the best case, and a 3% higher average accuracy while reducing computational cost by over 90%.
Automatic Identification of Machine Learning-Specific Code Smells
Hamfelt, Peter, Britto, Ricardo, Rocha, Lincoln, Almendra, Camilo
Machine learning (ML) has rapidly grown in popularity, becoming vital to many industries. Currently, the research on code smells in ML applications lacks tools and studies that address the identification and validity of ML-specific code smells. This work investigates suitable methods and tools to design and develop a static code analysis tool (MLpylint) based on code smell criteria. This research employed the Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. We evaluated the tool on data from 160 open-source ML applications sourced from GitHub. We also conducted a static validation through an expert survey involving 15 ML professionals. The results indicate the effectiveness and usefulness of the MLpylint. We aim to extend our current approach by investigating ways to introduce MLpylint seamlessly into development workflows, fostering a more productive and innovative developer environment.
Multi-Class Human/Object Detection on Robot Manipulators using Proprioceptive Sensing
Hehli, Justin, Heiniger, Marco, Rezayati, Maryam, van de Venn, Hans Wernher
In physical human-robot collaboration (pHRC) settings, humans and robots collaborate directly in shared environments. Robots must analyze interactions with objects to ensure safety and facilitate meaningful workflows. One critical aspect is human/object detection, where the contacted object is identified. Past research introduced binary machine learning classifiers to distinguish between soft and hard objects. This study improves upon those results by evaluating three-class human/object detection models, offering more detailed contact analysis. A dataset was collected using the Franka Emika Panda robot manipulator, exploring preprocessing strategies for time-series analysis. Models including LSTM, GRU, and Transformers were trained on these datasets. The best-performing model achieved 91.11\% accuracy during real-time testing, demonstrating the feasibility of multi-class detection models. Additionally, a comparison of preprocessing strategies suggests a sliding window approach is optimal for this task.
Leveraging Machine Learning for Botnet Attack Detection in Edge-Computing Assisted IoT Networks
Rupanetti, Dulana, Kaabouch, Naima
The increase of IoT devices, driven by advancements in hardware technologies, has led to widespread deployment in large-scale networks that process massive amounts of data daily. However, the reliance on Edge Computing to manage these devices has introduced significant security vulnerabilities, as attackers can compromise entire networks by targeting a single IoT device. In light of escalating cybersecurity threats, particularly botnet attacks, this paper investigates the application of machine learning techniques to enhance security in Edge-Computing-Assisted IoT environments. Specifically, it presents a comparative analysis of Random Forest, XGBoost, and LightGBM -- three advanced ensemble learning algorithms -- to address the dynamic and complex nature of botnet threats. Utilizing a widely recognized IoT network traffic dataset comprising benign and malicious instances, the models were trained, tested, and evaluated for their accuracy in detecting and classifying botnet activities. Furthermore, the study explores the feasibility of deploying these models in resource-constrained edge and IoT devices, demonstrating their practical applicability in real-world scenarios. The results highlight the potential of machine learning to fortify IoT networks against emerging cybersecurity challenges.
Foundation Models for Bioacoustics -- a Comparative Review
Schwinger, Raphael, Zadeh, Paria Vali, Rauch, Lukas, Kurz, Mats, Hauschild, Tom, Lapp, Sam, Tomforde, Sven
Automated bioacoustic analysis is essential for biodiversity monitoring and conservation, requiring advanced deep learning models that can adapt to diverse bioacoustic tasks. This article presents a comprehensive review of large-scale pretrained bioacoustic foundation models and systematically investigates their transferability across multiple bioacoustic classification tasks. We overview bioacoustic representation learning including major pretraining data sources and benchmarks. On this basis, we review bioacoustic foundation models by thoroughly analysing design decisions such as model architecture, pretraining scheme, and training paradigm. Additionally, we evaluate selected foundation models on classification tasks from the BEANS and BirdSet benchmarks, comparing the generalisability of learned representations under both linear and attentive probing strategies. Our comprehensive experimental analysis reveals that BirdMAE, trained on large-scale bird song data with a self-supervised objective, achieves the best performance on the BirdSet benchmark. On BEANS, BEATs$_{NLM}$, the extracted encoder of the NatureLM-audio large audio model, is slightly better. Both transformer-based models require attentive probing to extract the full performance of their representations. ConvNext$_{BS}$ and Perch models trained with supervision on large-scale bird song data remain competitive for passive acoustic monitoring classification tasks of BirdSet in linear probing settings. Training a new linear classifier has clear advantages over evaluating these models without further training. While on BEANS, the baseline model BEATs trained with self-supervision on AudioSet outperforms bird-specific models when evaluated with attentive probing. These findings provide valuable guidance for practitioners selecting appropriate models to adapt them to new bioacoustic classification tasks via probing.
AutoSIGHT: Automatic Eye Tracking-based System for Immediate Grading of Human experTise
Dowling, Byron, Probcin, Jozef, Czajka, Adam
--Can we teach machines to assess the expertise of humans solving visual tasks automatically based on eye tracking features? This paper proposes AutoSIGHT, Automatic System for Immediate Grading of Human experTise, that classifies expert and non-expert performers, and builds upon an ensemble of features extracted from eye tracking data while the performers were solving a visual task. Results on the task of iris Presentation Attack Detection (PAD) used for this study show that with a small evaluation window of just 5 seconds, AutoSIGHT achieves an average average Area Under the ROC curve performance of 0.751 in subject-disjoint train-test regime, indicating that such detection is viable. Furthermore, when a larger evaluation window of up to 30 seconds is available, the Area Under the ROC curve (AUROC) increases to 0.8306, indicating the model is effectively leveraging more information at a cost of slightly delayed decisions. This work opens new areas of research on how to incorporate the automatic weighing of human and machine expertise into human-AI pairing setups, which need to react dynamically to nonstationary expertise distribution between the human and AI players ( e.g., when the experts need to be replaced, or the task at hand changes rapidly). Along with this paper, we offer the eye tracking data used in this study collected from 6 experts and 53 non-experts solving iris PAD visual task. As Artificial Intelligence (AI) systems become more commonplace in everyday tasks, companies and researchers alike understand that a lack of trust in a model or the validity of a model's decision is a major obstacle to wide-scale adoption [1]. This has led to the sub-field of Trustworthy Artificial Intelligence (T AI) that focuses on defining the core principles that AI systems should satisfy to increase trust and adoption. One such principle is that good models should generalize well to unseen data types (that is, operate well in an open set recognition regime). Another principle is that there should exist a seamless and effective collaboration between the AI and humans solving the tasks jointly, in which the capabilities of both sides are appropriately and automatically assessed, and incorporated into the decision-making process.
From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model
Ju, Yeong-Joon, Lee, Seong-Whan
--Multimodal Large Language Models (MLLMs) have emerged as a promising solution for universal embedding tasks, yet adapting their generative nature for discriminative representation learning remains a significant challenge. The dominant paradigm of large-scale contrastive pre-training suffers from critical inefficiencies, including prohibitive computational costs and a failure to leverage the intrinsic, instruction-following capabilities of MLLMs. T o overcome these limitations, we propose an efficient framework for universal multimodal embeddings, which bridges this gap by centering on two synergistic components. First, our hierarchical embedding prompt template employs a two-level instruction architecture that forces the model to produce discriminative representations. Building on this strong foundation, our second component, self-aware hard negative sampling, redefines the fine-tuning process by leveraging the model's own understanding to efficiently mine challenging negatives while actively filtering out potential false negatives. Our comprehensive experiments show that our hierarchical prompt achieves zero-shot performance competitive with contrastively trained baselines and enhances the fine-tuning process by lifting a simple in-batch negative baseline by 4.8 points on the MMEB benchmark. Our work presents an effective and efficient pathway to adapt MLLMs for universal embedding tasks, significantly reducing training time. Embeddings [1], [2] are fixed-dimensional representations of inputs, encoded as semantic information within a continuous vector space, underpinning various downstream tasks such as clustering [3], [4], retrieval [5]-[8], and classification [9]. Following the success of instruction-based multi-task training methods [10], [11], the focus of research has shifted toward achieving universal embeddings [12]-[14], where a single model provides robust representations across diverse tasks and domains. The rapid growth of multimedia applications has further driven the need for universal multimodal embed-dings [15]-[18] capable of supporting both uni-modal and cross-modal retrieval.
SmartDate: AI-Driven Precision Sorting and Quality Control in Date Fruits
Traditional machine learning met hods, such as support vector machines (SVM), artificial neural networks (ANN), and logistic regression, have been employed to classify dates based on morphological features like color, t exture, and shape. While effective, these approaches often lack the flexibility and comprehensive quality control needed in modern agricultural practices. To address these limitations, the SmartDate system represents a significant technological advancement by integrat ing deep learning with genetic algorithms and reinforcement learning. This AI-driven system not only excels in date fruit classification but also predicts expiration dates, filling a cruci al gap in existing solutions. SmartDate leverages multispectral and hyperspectral imaging, coupled with Visible-Near-Infrared (VisNIR) spectral sensors, to assess key quality indicators such as moisture c ontent, sugar levels, firmness, and internal defects. This allows for a more thorough evaluation of fruit quality compared to co nventional methods. Moreover, the inclusion of reinforcement learning e nables SmartDate to adapt in real-time to production envir onment changes, optimizing sorting accuracy and ensuring t hat only premium quality dates reach the market.