Goto

Collaborating Authors

 dos


DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models

arXiv.org Machine Learning

Masked diffusion language models (MDLMs) have recently emerged as a new paradigm in language modeling, offering flexible generation dynamics and enabling efficient parallel decoding. However, existing decoding strategies for pre-trained MDLMs predominantly rely on token-level uncertainty criteria, while largely overlooking sequence-level information and inter-token dependencies. To address this limitation, we propose Dependency-Oriented Sampler (DOS), a training-free decoding strategy that leverages inter-token dependencies to inform token updates during generation. Specifically, DOS exploits attention matrices from transformer blocks to approximate inter-token dependencies, emphasizing information from unmasked tokens when updating masked positions. Empirical results demonstrate that DOS consistently achieves superior performance on both code generation and mathematical reasoning tasks. Moreover, DOS can be seamlessly integrated with existing parallel sampling methods, leading to improved generation efficiency without sacrificing generation quality.


Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer Namkyeong Lee

Neural Information Processing Systems

That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOST ransformer .


ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have become foundational components in a wide range of applications, including natural language understanding and generation, embodied intelligence, and scientific discovery. As their computational requirements continue to grow, these models are increasingly deployed as cloud-based services, allowing users to access powerful LLMs via the Internet. However, this deployment model introduces a new class of threat: denial-of-service (DoS) attacks via unbounded reasoning, where adversaries craft specially designed inputs that cause the model to enter excessively long or infinite generation loops. These attacks can exhaust backend compute resources, degrading or denying service to legitimate users. To mitigate such risks, many LLM providers adopt a closed-source, black-box setting to obscure model internals. In this paper, we propose ThinkTrap, a novel input-space optimization framework for DoS attacks against LLM services even in black-box environments. The core idea of ThinkTrap is to first map discrete tokens into a continuous embedding space, then undertake efficient black-box optimization in a low-dimensional subspace exploiting input sparsity. The goal of this optimization is to identify adversarial prompts that induce extended or non-terminating generation across several state-of-the-art LLMs, achieving DoS with minimal token overhead. We evaluate the proposed attack across multiple commercial, closed-source LLM services. Our results demonstrate that, even far under the restrictive request frequency limits commonly enforced by these platforms, typically capped at ten requests per minute (10 RPM), the attack can degrade service throughput to as low as 1% of its original capacity, and in some cases, induce complete service failure.


Bayesian Optimization of Multi-Bit Pulse Encoding in In2O3/Al2O3 Thin-film Transistors for Temporal Data Processing

arXiv.org Artificial Intelligence

Utilizing the intrinsic history-dependence and nonlinearity of hardware, physical reservoir computing is a promising neuromorphic approach to encode time-series data for in-sensor computing. The accuracy of this encoding critically depends on the distinguishability of multi-state outputs, which is often limited by suboptimal and empirically chosen reservoir operation conditions. In this work, we demonstrate a machine learning approach, Bayesian optimization, to improve the encoding fidelity of solution-processed Al2O3/In2O3 thin-film transistors (TFTs). We show high-fidelity 6-bit temporal encoding by exploring five key pulse parameters and using the normalized degree of separation (nDoS) as the metric of output state separability. Additionally, we show that a model trained on simpler 4-bit data can effectively guide optimization of more complex 6-bit encoding tasks, reducing experimental cost. Specifically, for the encoding and reconstruction of binary-patterned images of a moving car across 6 sequential frames, we demonstrate that the encoding is more accurate when operating the TFT using optimized pulse parameters and the 4-bit optimized operating condition performs almost as well as the 6-bit optimized condition. Finally, interpretability analysis via Shapley Additive Explanations (SHAP) reveals that gate pulse amplitude and drain voltage are the most influential parameters in achieving higher state separation. This work presents the first systematic method to identify optimal operating conditions for reservoir devices, and the approach can be extended to other physical reservoir implementations across different material platforms.



An Efficient Intrusion Detection System for Safeguarding Radiation Detection Systems

arXiv.org Artificial Intelligence

Radiation Detection Systems (RDSs) are used to measure and detect abnormal levels of radioactive material in the environment. These systems are used in many applications to mitigate threats posed by high levels of radioactive material. However, these systems lack protection against malicious external attacks to modify the data. The novelty of applying Intrusion Detection Systems (IDS) in RDSs is a crucial element in safeguarding these critical infrastructures. While IDSs are widely used in networking environments to safeguard against various attacks, their application in RDSs is novel. A common attack on RDSs is Denial of Service (DoS), where the attacker aims to overwhelm the system, causing malfunctioning RDSs. This paper proposes an efficient Machine Learning (ML)-based IDS to detect anomalies in radiation data, focusing on DoS attacks. This work explores the use of sampling methods to create a simulated DoS attack based on a real radiation dataset, followed by an evaluation of various ML algorithms, including Random Forest, Support Vector Machine (SVM), logistic regression, and Light Gradient-Boosting Machine (LightGBM), to detect DoS attacks on RDSs. LightGBM is emphasized for its superior accuracy and low computational resource consumption, making it particularly suitable for real-time intrusion detection. Additionally, model optimization and TinyML techniques, including feature selection, parallel execution, and random search methods, are used to improve the efficiency of the proposed IDS. Finally, an optimized and efficient LightGBM-based IDS is developed to achieve accurate intrusion detection for RDSs.



Enhancing ML Model Interpretability: Leveraging Fine-Tuned Large Language Models for Better Understanding of AI

arXiv.org Artificial Intelligence

Across various sectors applications of eXplainableAI (XAI) gained momentum as the increasing black-boxedness of prevailing Machine Learning (ML) models became apparent. In parallel, Large Language Models (LLMs) significantly developed in their abilities to understand human language and complex patterns. By combining both, this paper presents a novel reference architecture for the interpretation of XAI through an interactive chatbot powered by a fine-tuned LLM. We instantiate the reference architecture in the context of State-of-Health (SoH) prediction for batteries and validate its design in multiple evaluation and demonstration rounds. The evaluation indicates that the implemented prototype enhances the human interpretability of ML, especially for users with less experience with XAI.


Interpretable, multi-dimensional Evaluation Framework for Causal Discovery from observational i.i.d. Data

arXiv.org Machine Learning

Nonlinear causal discovery from observational data imposes strict identifiability assumptions on the formulation of structural equations utilized in the data generating process. The evaluation of structure learning methods under assumption violations requires a rigorous and interpretable approach, which quantifies both the structural similarity of the estimation with the ground truth and the capacity of the discovered graphs to be used for causal inference. Motivated by the lack of unified performance assessment framework, we introduce an interpretable, six-dimensional evaluation metric, i.e., distance to optimal solution (DOS), which is specifically tailored to the field of causal discovery. Furthermore, this is the first research to assess the performance of structure learning algorithms from seven different families on increasing percentage of non-identifiable, nonlinear causal patterns, inspired by real-world processes. Our large-scale simulation study, which incorporates seven experimental factors, shows that besides causal order-based methods, amortized causal discovery delivers results with comparatively high proximity to the optimal solution.


MYCROFT: Towards Effective and Efficient External Data Augmentation

arXiv.org Artificial Intelligence

Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are hesitant to share their data due to propriety and privacy concerns. This makes it challenging and expensive for model trainers to acquire the data they need to improve model performance. To address this challenge, we propose Mycroft, a data-efficient method that enables model trainers to evaluate the relative utility of different data sources while working with a constrained data-sharing budget. By leveraging feature space distances and gradient matching, Mycroft identifies small but informative data subsets from each owner, allowing model trainers to maximize performance with minimal data exposure. Experimental results across four tasks in two domains show that Mycroft converges rapidly to the performance of the full-information baseline, where all data is shared. Moreover, Mycroft is robust to noise and can effectively rank data owners by utility. Mycroft can pave the way for democratized training of high performance ML models.