Government
Toward Autonomous and Efficient Cybersecurity: A Multi-Objective AutoML-based Intrusion Detection System
With increasingly sophisticated cybersecurity threats and rising demand for network automation, autonomous cybersecurity mechanisms are becoming critical for securing modern networks. The rapid expansion of Internet of Things (IoT) systems amplifies these challenges, as resource-constrained IoT devices demand scalable and efficient security solutions. In this work, an innovative Intrusion Detection System (IDS) utilizing Automated Machine Learning (AutoML) and Multi-Objective Optimization (MOO) is proposed for autonomous and optimized cyber-attack detection in modern networking environments. The proposed IDS framework integrates two primary innovative techniques: Optimized Importance and Percentage-based Automated Feature Selection (OIP-AutoFS) and Optimized Performance, Confidence, and Efficiency-based Combined Algorithm Selection and Hyperparameter Optimization (OPCE-CASH). These components optimize feature selection and model learning processes to strike a balance between intrusion detection effectiveness and computational efficiency. This work presents the first IDS framework that integrates all four AutoML stages and employs multi-objective optimization to jointly optimize detection effectiveness, efficiency, and confidence for deployment in resource-constrained systems. Experimental evaluations over two benchmark cybersecurity datasets demonstrate that the proposed MOO-AutoML IDS outperforms state-of-the-art IDSs, establishing a new benchmark for autonomous, efficient, and optimized security for networks. Designed to support IoT and edge environments with resource constraints, the proposed framework is applicable to a variety of autonomous cybersecurity applications across diverse networked environments.
Revealing the Hidden Third Dimension of Point Defects in Two-Dimensional MXenes
Guinan, Grace, Smeaton, Michelle A., Wyatt, Brian C., Goldy, Steven, Egan, Hilary, Glaws, Andrew, Tucker, Garritt J., Anasori, Babak, Spurgeon, Steven R.
Point defects govern many important functional properties of two - dimensional ( 2D) materials. However, resolving the three - dimensional (3D) arrangement of these defects in multi - layer 2D materials remains a fundamental challenge, hindering rational defect engineering . Our approach reconstructs the 3D coordinates of vacancies across hundreds of thousands of lattice sites, generating robust statistical insight into their dist ribution that can be correlated with specinullic synthesis pathways. This large - scale data enables us to classify a hierarchy of defect structures -- from isolated vacancies to nanopores -- revealing their preferred formation and interaction mechanisms, as corroborated by molecular dynamics simulations . This work provides a generalizable framework for understanding and ultimately controlling point defects across large volumes, paving the way for the rational design of defect - engineered functional 2D materials. Keywords: 2D materials, point defects, autonomous materials science, electron microscopy, machine learning 2 Two - dimensional (2D) materials have become a major nullield of modern research in materials science after the discovery of graphene in 2004 . The challenge of characterizing point defects is signinullicantly amplinullied in few - layered 2D materials. For instance, MXenes -- a class of 2D transition metal carbides, carbonitrides, and nitrides -- consist of nanosheets containing two to nullive layers of metal ato ms, which complicates defect analysis compared to single - layer materials .
Revisiting Network Traffic Analysis: Compatible network flows for ML models
Vitorino, Joรฃo, Pinto, Daniela, Maia, Eva, Amorim, Ivone, Praรงa, Isabel
To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. However, it can be difficult to accurately represent the complex traffic patterns of an attack, especially in Internet-of-Things (IoT) networks. This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of ML models. In addition to the original CSV files of the Bot-IoT, IoT-23, and CICIoT23 datasets, the raw network packets of their PCAP files were analysed with the HERA tool, generating new labelled flows and extracting consistent features for new CSV versions. To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to fine-tune multiple models. Overall, the results indicate that directly analysing and preprocessing PCAP files, instead of just using the commonly available CSV files, enables the computation of more relevant features to train bagging and gradient boosting decision tree ensembles. It is important to continue improving feature extraction and feature selection processes to make different datasets more compatible and enable a trustworthy evaluation and comparison of the ML models used in cybersecurity solutions.
ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech
Koniaris, Marios, Tsipi, Argyro, Tsanakas, Panayiotis
Parliamentary speech generation presents specific challenges for large language models beyond standard text generation tasks. Unlike general text generation, parliamentary speeches require not only linguistic quality but also political authenticity and ideological consistency. Current language models lack specialized training for parliamentary contexts, and existing evaluation methods focus on standard NLP metrics rather than political authenticity. To address this, we present ParliaBench, a benchmark for parliamentary speech generation. We constructed a dataset of speeches from UK Parliament to enable systematic model training. We introduce an evaluation framework combining computational metrics with LLM-as-a-judge assessments for measuring generation quality across three dimensions: linguistic quality, semantic coherence, and political authenticity. We propose two novel embedding-based metrics, Political Spectrum Alignment and Party Alignment, to quantify ideological positioning. We fine-tuned five large language models (LLMs), generated 28k speeches, and evaluated them using our framework, comparing baseline and fine-tuned models. Results show that fine-tuning produces statistically significant improvements across the majority of metrics and our novel metrics demonstrate strong discriminative power for political dimensions.
Good flavor search in $SU(5)$: a machine learning approach
Abu-Ajamieh, Fayez, Kawai, Shinsuke, Okada, Nobuchika
We revisit the fermion mass problem of the $SU(5)$ grand unified theory using machine learning techniques. The original $SU(5)$ model proposed by Georgi and Glashow is incompatible with the observed fermion mass spectrum. Two remedies are known to resolve this discrepancy, one is through introducing a new interaction via a 45-dimensional field, and the other via a 24-dimensional field. We investigate which modification is more natural, defining naturalness as proximity to the original Georgi-Glashow $SU(5)$ model. Our analysis shows that, in both supersymmetric and non-supersymmetric scenarios, the model incorporating the interaction with the 24-dimensional field is more natural under this criterion. We then generalise these models by introducing a continuous parameter $y$, which takes the value 3 for the 45-dimensional field and 1.5 for the 24-dimensional field. Numerical optimisation reveals that $y \approx 0.8$ yields the closest match to the original $SU(5)$ model, indicating that this value corresponds to the most natural model according to our definition.
A robust methodology for long-term sustainability evaluation of Machine Learning models
Paz-Ruza, Jorge, Gama, Joรฃo, Alonso-Betanzos, Amparo, Guijarro-Berdiรฑas, Bertha
Among the many desirable properties of Artificial Intelligence systems, sustainability and efficiency have become increasingly important in the context of worsening climate change, massive water use in data centres, and the need for simpler, faster models in IoT settings. Consequently, there have been not few attempts to both promote and regulate the sustainability of Machine Learning models; the EU's AI Act indicates that the sustainability of AI - in terms of its environmental and social footprint-should be considered when developing and deploying AI pipelines [1], and manifests like that of UNESCO highlight sustainability as one of the core principles of the broader Responsible AI paradigm [2]. However, this seemingly consensual agreement on the importance of sustainability and efficiency for real-world AI systems and the social and regulatory efforts heavily contrasts with the practical applicability of such regulations; without looking further, the AI Act itself defines the requirement for sustainability, but does not indicate what metrics and evaluation pipelines should be considered for a robust, reliable, and practically relevant assessment of the environmental impact of a model. We argue that this lack of comprehensiveness in sustainability recommendations for AI systems does not stem from a careless or sloppy construction of the regulations themselves, but rather from an actual absence of suitable evaluation protocols that are formal, model-agnostic, reproducible, and grounded in real-life usage protocols for the ML lifecycle. The authors of this preprint are aware of a single regulatory standard for measuring AI sustainability, namely UNE 0086 [3], which limits evaluation to the epoch-batch training paradigm of supervised learning systems, rendering it useless for any task or type of learning that deviates from that standard. Although many researchers and companies have made it a habit to report efficiency figures and comparisons (e.g., in terms of emitted CO
StableMorph: High-Quality Face Morph Generation with Stable Diffusion
Kabbani, Wassim, Raja, Kiran, Ramachandra, Raghavendra, Busch, Christoph
Face morphing attacks threaten the integrity of biometric identity systems by enabling multiple individuals to share a single identity. T o develop and evaluate effective morphing attack detection (MAD) systems, we need access to high-quality, realistic morphed images that reflect the challenges posed in real-world scenarios. However, existing morph generation methods often produce images that are blurry, riddled with artifacts, or poorly constructed--making them easy to detect and not representative of the most dangerous attacks. In this work, we introduce StableMorph, a novel approach that generates highly realistic, artifact-free morphed face images using modern diffusion-based image synthesis. Unlike prior methods, StableMorph produces full-head images with sharp details, avoids common visual flaws, and offers unmatched control over visual attributes. Through extensive evaluation, we show that StableMorph images not only rival or exceed the quality of genuine face images, but also maintain a strong ability to fool face recognition systems--posing a greater challenge to existing MAD solutions and setting a new standard for morph quality in research and operational testing. StableMorph improves the evaluation of biometric security by creating more realistic and effective attacks and supports the development of more robust detection systems.
MSCR: Exploring the Vulnerability of LLMs' Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
Sun, Zhishen, Dai, Guang, Ye, Haishan
LLMs demonstrate performance comparable to human abilities in complex tasks such as mathematical reasoning, but their robustness in mathematical reasoning under minor input perturbations still lacks systematic investigation. Existing methods generally suffer from limited scalability, weak semantic preservation, and high costs. Therefore, we propose MSCR, an automated adversarial attack method based on multi-source candidate replacement. By combining three information sources including cosine similarity in the embedding space of LLMs, the WordNet dictionary, and contextual predictions from a masked language model, we generate for each word in the input question a set of semantically similar candidates, which are then filtered and substituted one by one to carry out the attack. We conduct large-scale experiments on LLMs using the GSM8K and MATH500 benchmarks. The results show that even a slight perturbation involving only a single word can significantly reduce the accuracy of all models, with the maximum drop reaching 49.89% on GSM8K and 35.40% on MATH500, while preserving the high semantic consistency of the perturbed questions. Further analysis reveals that perturbations not only lead to incorrect outputs but also substantially increase the average response length, which results in more redundant reasoning paths and higher computational resource consumption.
State of the Art in Text Classification for South Slavic Languages: Fine-Tuning or Prompting?
Pungerลกek, Taja Kuzman, Rupnik, Peter, Porupski, Ivan, Diniฤ, Vuk, Ljubeลกiฤ, Nikola
Until recently, fine-tuned BERT-like models provided state-of-the-art performance on text classification tasks. With the rise of instruction-tuned decoder-only models, commonly known as large language models (LLMs), the field has increasingly moved toward zero-shot and few-shot prompting. However, the performance of LLMs on text classification, particularly on less-resourced languages, remains under-explored. In this paper, we evaluate the performance of current language models on text classification tasks across several South Slavic languages. We compare openly available fine-tuned BERT-like models with a selection of open-source and closed-source LLMs across three tasks in three domains: sentiment classification in parliamentary speeches, topic classification in news articles and parliamentary speeches, and genre identification in web texts. Our results show that LLMs demonstrate strong zero-shot performance, often matching or surpassing fine-tuned BERT-like models. Moreover, when used in a zero-shot setup, LLMs perform comparably in South Slavic languages and English. However, we also point out key drawbacks of LLMs, including less predictable outputs, significantly slower inference, and higher computational costs. Due to these limitations, fine-tuned BERT-like models remain a more practical choice for large-scale automatic text annotation.
Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
Lee, Justin, Mai, Zheda, Yoo, Jinsu, Fan, Chongyu, Zhang, Cheng, Chao, Wei-Lun
Machine unlearning--the ability to remove designated concepts from a pre-trained model--has advanced rapidly, particularly for text-to-image diffusion models. However, existing methods typically assume that unlearning requests arrive all at once, whereas in practice they often arrive sequentially. We present the first systematic study of continual unlearning in text-to-image diffusion models and show that popular unlearning methods suffer from rapid utility collapse: after only a few requests, models forget retained knowledge and generate degraded images. We trace this failure to cumulative parameter drift from the pre-training weights and argue that regularization is crucial to addressing it. To this end, we study a suite of add-on regularizers that (1) mitigate drift and (2) remain compatible with existing unlearning methods. Beyond generic regularizers, we show that semantic awareness is essential for preserving concepts close to the unlearning target, and propose a gradient-projection method that constrains parameter drift orthogonal to their subspace. This substantially improves continual unlearning performance and is complementary to other regularizers for further gains. Taken together, our study establishes continual unlearning as a fundamental challenge in text-to-image generation and provides insights, baselines, and open directions for advancing safe and accountable generative AI.