Slovakia
Penalty-based Methods for Simple Bilevel Optimization under Hölderian Error Bounds
This paper investigates simple bilevel optimization problems where we minimize an upper-level objective over the optimal solution set of a convex lower-level objective. Existing methods for such problems either only guarantee asymptotic convergence, have slow sublinear rates, or require strong assumptions. To address these challenges, we propose a penalization framework that delineates the relationship between approximate solutions of the original problem and its reformulated counterparts.
MALT Powers Up Adversarial Attacks
Current adversarial attacks for multi-class classifiers choose the target class for a given input naively, based on the classifier's confidence levels for various target classes. We present a novel adversarial targeting method, MALT - Mesoscopic Almost Linearity Targeting, based on medium-scale almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on the standard benchmark datasets CIFAR-100 and ImageNet and for a variety of robust models. In particular, our attack is five times faster than AutoAttack, while successfully matching all of AutoAttack's successes and attacking additional samples that were previously out of reach. We then prove formally and demonstrate empirically that our targeting method, although inspired by linear predictors, also applies to standard non-linear models.
Latent Space Representation of Electricity Market Curves for Improved Prediction Efficiency
Výboh, Martin, Chladná, Zuzana, Grmanová, Gabriela, Lucká, Mária
This work presents a three-phase ML prediction framework designed to handle a high dimensionality and multivariate time series character of the electricity market curves. In the preprocessing phase, we transform the original data to achieve a unified structure and mitigate the effect of possible outliers. Further, to address the challenge of high dimensionality, we test three dimensionality reduction techniques (PCA, kPCA, UMAP). Finally, we predict supply and demand curves, once represented in a latent space, with a variety of machine learning methods (RF, LSTM, TSMixer). As our results on the MIBEL dataset show, a high dimensional structure of the market curves can be best handled by the nonlinear reduction technique UMAP. Regardless of the ML technique used for prediction, we achieved the lowest values for all considered precision metrics with a UMAP latent space representation in only two or three dimensions, even when compared to PCA and kPCA with five or six dimensions. Further, we demonstrate that the most promising machine learning technique to handle the complex structure of the electricity market curves is a novel TSMixer architecture. Finally, we fill the gap in the field of electricity market curves prediction literature: in addition to standard analysis on the supply side, we applied the ML framework and predicted demand curves too. We discussed the differences in the achieved results for these two types of curves.
Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation
Prashant, Mohit, Easwaran, Arvind, Das, Suman, Yuhas, Michael
An issue concerning the use of deep reinforcement learning (RL) agents is whether they can be trusted to perform reliably when deployed, as training environments may not reflect real-life environments. Anticipating instances outside their training scope, learning-enabled systems are often equipped with out-of-distribution (OOD) detectors that alert when a trained system encounters a state it does not recognize or in which it exhibits uncertainty. There exists limited work conducted on the problem of OOD detection within RL, with prior studies being unable to achieve a consensus on the definition of OOD execution within the context of RL. By framing our problem using a Markov Decision Process, we assume there is a transition distribution mapping each state-action pair to another state with some probability. Based on this, we consider the following definition of OOD execution within RL: A transition is OOD if its probability during real-life deployment differs from the transition distribution encountered during training. As such, we utilize conditional variational autoencoders (CVAE) to approximate the transition dynamics of the training environment and implement a conformity-based detector using reconstruction loss that is able to guarantee OOD detection with a pre-determined confidence level. We evaluate our detector by adapting existing benchmarks and compare it with existing OOD detection models for RL.
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Gajdošech, Lukáš, Ali, Hassan, Habekost, Jan-Gerrit, Madaras, Martin, Kerzel, Matthias, Wermter, Stefan
Datasets for object detection often do not account for enough variety of glasses, due to their transparent and reflective properties. Specifically, open-vocabulary object detectors, widely used in embodied robotic agents, fail to distinguish subclasses of glasses. This scientific gap poses an issue to robotic applications that suffer from accumulating errors between detection, planning, and action execution. The paper introduces a novel method for the acquisition of real-world data from RGB-D sensors that minimizes human effort. We propose an auto-labeling pipeline that generates labels for all the acquired frames based on the depth measurements. We provide a novel real-world glass object dataset that was collected on the Neuro-Inspired COLlaborator (NICOL), a humanoid robot platform. The data set consists of 7850 images recorded from five different cameras. We show that our trained baseline model outperforms state-of-the-art open-vocabulary approaches. In addition, we deploy our baseline model in an embodied agent approach to the NICOL platform, on which it achieves a success rate of 81% in a human-robot bartending scenario.
A Comprehensive Survey of Fuzzy Implication Functions
Fuzzy implication functions are a key area of study in fuzzy logic, extending the classical logical conditional to handle truth degrees in the interval $[0,1]$. While existing literature often focuses on a limited number of families, in the last ten years many new families have been introduced, each defined by specific construction methods and having different key properties. This survey aims to provide a comprehensive and structured overview of the diverse families of fuzzy implication functions, emphasizing their motivations, properties, and potential applications. By organizing the information schematically, this document serves as a valuable resource for both theoretical researchers seeking to avoid redundancy and practitioners looking to select appropriate operators for specific applications.
Compact Rule-Based Classifier Learning via Gradient Descent
Fumanal-Idocin, Javier, Fernandez-Peralta, Raquel, Andreu-Perez, Javier
Rule-based models play a crucial role in scenarios that require transparency and accountable decision-making. However, they primarily consist of discrete parameters and structures, which presents challenges for scalability and optimization. In this work, we introduce a new rule-based classifier trained using gradient descent, in which the user can control the maximum number and length of the rules. For numerical partitions, the user can also control the partitions used with fuzzy sets, which also helps keep the number of partitions small. We perform a series of exhaustive experiments on $40$ datasets to show how this classifier performs in terms of accuracy and rule base size. Then, we compare our results with a genetic search that fits an equivalent classifier and with other explainable and non-explainable state-of-the-art classifiers. Our results show how our method can obtain compact rule bases that use significantly fewer patterns than other rule-based methods and perform better than other explainable classifiers.
Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation
In this work, we explore how instance-level memorization in the teacher Neural Machine Translation (NMT) model gets inherited by the student model in sequence-level knowledge distillation (SeqKD). We find that despite not directly seeing the original training data, students memorize more than baseline models (models of the same size, trained on the original data) -- 3.4% for exact matches and 57% for extractive memorization -- and show increased hallucination rates. Further, under this SeqKD setting, we also characterize how students behave on specific training data subgroups, such as subgroups with low quality and specific counterfactual memorization (CM) scores, and find that students exhibit amplified denoising on low-quality subgroups. Finally, we propose a modification to SeqKD named Adaptive-SeqKD, which intervenes in SeqKD to reduce memorization and hallucinations. Overall, we recommend caution when applying SeqKD: students inherit both their teachers' superior performance and their fault modes, thereby requiring active monitoring.
RoBo6: Standardized MMT Light Curve Dataset for Rocket Body Classification
Kyselica, Daniel, Šuppa, Marek, Šilha, Jiří, Ďurikovič, Roman
Space debris presents a critical challenge for the sustainability of future space missions, emphasizing the need for robust and standardized identification methods. However, a comprehensive benchmark for rocket body classification remains absent. This paper addresses this gap by introducing the RoBo6 dataset for rocket body classification based on light curves. The dataset, derived from the Mini Mega Tortora database, includes light curves for six rocket body classes: CZ-3B, Atlas 5 Centaur, Falcon 9, H-2A, Ariane 5, and Delta 4. With 5,676 training and 1,404 test samples, it addresses data inconsistencies using resampling, normalization, and filtering techniques. Several machine learning models were evaluated, including CNN and transformer-based approaches, with Astroconformer reporting the best performance. The dataset establishes a common benchmark for future comparisons and advancements in rocket body classification tasks.
Intent Classification for Bank Chatbots through LLM Fine-Tuning
Lajčinová, Bibiána, Valábek, Patrik, Spišiak, Michal
The advent of digital technologies has significantly influenced customer service methodologies, with a notable shift towards integrating chatbots for handling customer support inquiries. This trend is primarily observed on business websites, where chatbots serve to facilitate customer queries pertinent to the business's domain. These virtual assistants are instrumental in providing essential information to customers, thereby reducing the workload traditionally managed by human customer support agents. In the realm of chatbot development, recent years have witnessed a surge in the employment of generative artificial intelligence technologies to craft customized responses. Despite this technological advancement, certain enterprises continue to favor a more structured approach to chatbot interactions. In this perspective, the content of responses is predetermined rather than generated on-the-fly, ensuring accuracy of information and adherence to the business's branding style. The deployment of these chatbots typically involves defining specific classifications known as intents. Each intent correlates with a particular customer inquiry, guiding the chatbot to deliver an appropriate response. Consequently, a pivotal challenge within this system lies in accurately identifying the user's intent based on their textual input to the chatbot.