Goto

Collaborating Authors

 glob



SCALAR: Self-Calibrating Adaptive Latent Attention Representation Learning

Abbas, Farwa, Ahmad, Hussain, Szabo, Claudia

arXiv.org Artificial Intelligence

High-dimensional, heterogeneous data with complex feature interactions pose significant challenges for traditional predictive modeling approaches. While Projection to Latent Structures (PLS) remains a popular technique, it struggles to model complex non-linear relationships, especially in multivariate systems with high-dimensional correlation structures. This challenge is further compounded by simultaneous interactions across multiple scales, where local processing fails to capture crossgroup dependencies. Additionally, static feature weighting limits adaptability to contextual variations, as it ignores sample-specific relevance. To address these limitations, we propose a novel method that enhances predictive performance through novel architectural innovations. Our architecture introduces an adaptive kernel-based attention mechanism that processes distinct feature groups separately before integration, enabling capture of local patterns while preserving global relationships. Experimental results show substantial improvements in performance metrics, compared to the state-of-the-art methods across diverse datasets.



GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation

Ding, Yuanhao, Arias, Esteban Garces, Li, Meimingwei, Rodemann, Julian, Aßenmacher, Matthias, Chen, Danlu, Fan, Gaojuan, Heumann, Christian, Zhang, Chongsheng

arXiv.org Artificial Intelligence

Open-ended text generation faces a critical challenge: balancing coherence with diversity in LLM outputs. While contrastive search-based decoding strategies have emerged to address this trade-off, their practical utility is often limited by hyperparameter dependence and high computational costs. We introduce GUARD, a self-adaptive decoding method that effectively balances these competing objectives through a novel "Glocal" uncertainty-driven framework. GUARD combines global entropy estimates with local entropy deviations to integrate both long-term and short-term uncertainty signals. We demonstrate that our proposed global entropy formulation effectively mitigates abrupt variations in uncertainty, such as sudden overconfidence or high entropy spikes, and provides theoretical guarantees of unbiasedness and consistency. To reduce computational overhead, we incorporate a simple yet effective token-count-based penalty into GUARD. Experimental results demonstrate that GUARD achieves a good balance between text diversity and coherence, while exhibiting substantial improvements in generation speed. In a more nuanced comparison study across different dimensions of text quality, both human and LLM evaluators validated its remarkable performance. Our code is available at https://github.com/YecanLee/GUARD.


Predicting fermionic densities using a Projected Quantum Kernel method

Perciavalle, Francesco, Plastina, Francesco, Pisarra, Michele, Gullo, Nicola Lo

arXiv.org Artificial Intelligence

We use a support vector regressor based on a projected quantum kernel method to predict the density structure of 1D fermionic systems of interest in quantum chemistry and quantum matter. The kernel is built on with the observables of a quantum reservoir implementable with interacting Rydberg atoms. Training and test data of the fermionic system are generated using a Density Functional Theory approach. We test the performance of the method for several Hamiltonian parameters, finding a general common behavior of the error as a function of measurement time. At sufficiently large measurement times, we find that the method outperforms the classical linear kernel method and can be competitive with the radial basis function method.


Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

Arous, Gérard Ben, Erdogdu, Murat A., Vural, N. Mert, Wu, Denny

arXiv.org Machine Learning

We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $y \propto \sum_{j=1}^{r}λ_j σ\left(\langle \boldsymbol{θ_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $σ$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbolθ_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^β$ for $β\in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $λ_j\asymp j^{-α}$ for $α\geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.


Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning

Sharthak, Shaurya, Pahalwan, Vinayak, Kamath, Adithya, Shirawalmath, Adarsh

arXiv.org Artificial Intelligence

Pretrained language models (LLMs) are often constrained by their fixed tokenization schemes, leading to inefficiencies and performance limitations, particularly for multilingual or specialized applications. This tokenizer lock-in presents significant challenges. standard methods to overcome this often require prohibitive computational resources. Although tokenizer replacement with heuristic initialization aims to reduce this burden, existing methods often require exhaustive residual fine-tuning and still may not fully preserve semantic nuances or adequately address the underlying compression inefficiencies. Our framework introduces two innovations: first, Tokenadapt, a model-agnostic tokenizer transplantation method, and second, novel pre-tokenization learning for multi-word Supertokens to enhance compression and reduce fragmentation. Tokenadapt initializes new unique token embeddings via a hybrid heuristic that combines two methods: a local estimate based on subword decomposition using the old tokenizer, and a global estimate utilizing the top-k semantically similar tokens from the original vocabulary. This methodology aims to preserve semantics while significantly minimizing retraining requirements. Empirical investigations validate both contributions: the transplantation heuristic successfully initializes unique tokens, markedly outperforming conventional baselines and sophisticated methods including Transtokenizer and ReTok, while our Supertokens achieve notable compression gains. Our zero-shot perplexity results demonstrate that the TokenAdapt hybrid initialization consistently yields lower perplexity ratios compared to both ReTok and TransTokenizer baselines across different base models and newly trained target tokenizers. TokenAdapt typically reduced the overall perplexity ratio significantly compared to ReTok, yielding at least a 2-fold improvement in these aggregate scores.


Fed-Joint: Joint Modeling of Nonlinear Degradation Signals and Failure Events for Remaining Useful Life Prediction using Federated Learning

Jeong, Cheoljoon, Yue, Xubo, Chung, Seokhyun

arXiv.org Machine Learning

Many failure mechanisms of machinery are closely related to the behavior of condition monitoring (CM) signals. To achieve a cost-effective preventive maintenance strategy, accurate remaining useful life (RUL) prediction based on the signals is of paramount importance. However, the CM signals are often recorded at different factories and production lines, with limited amounts of data. Unfortunately, these datasets have rarely been shared between the sites due to data confidentiality and ownership issues, a lack of computing and storage power, and high communication costs associated with data transfer between sites and a data center. Another challenge in real applications is that the CM signals are often not explicitly specified \textit{a priori}, meaning that existing methods, which often usually a parametric form, may not be applicable. To address these challenges, we propose a new prognostic framework for RUL prediction using the joint modeling of nonlinear degradation signals and time-to-failure data within a federated learning scheme. The proposed method constructs a nonparametric degradation model using a federated multi-output Gaussian process and then employs a federated survival model to predict failure times and probabilities for in-service machinery. The superiority of the proposed method over other alternatives is demonstrated through comprehensive simulation studies and a case study using turbofan engine degradation signal data that include run-to-failure events.


Towards Robust Interpretable Surrogates for Optimization

Goerigk, Marc, Hartisch, Michael, Merten, Sebastian

arXiv.org Artificial Intelligence

An important factor in the practical implementation of optimization models is the acceptance by the intended users. This is influenced among other factors by the interpretability of the solution process. Decision rules that meet this requirement can be generated using the framework for inherently interpretable optimization models. In practice, there is often uncertainty about the parameters of an optimization problem. An established way to deal with this challenge is the concept of robust optimization. The goal of our work is to combine both concepts: to create decision trees as surrogates for the optimization process that are more robust to perturbations and still inherently interpretable. For this purpose we present suitable models based on different variants to model uncertainty, and solution methods. Furthermore, the applicability of heuristic methods to perform this task is evaluated. Both approaches are compared with the existing framework for inherently interpretable optimization models.


Variational Bayesian Bow tie Neural Networks with Shrinkage

Sheinkman, Alisa, Wade, Sara

arXiv.org Machine Learning

Despite the dominant role of deep models in machine learning, limitations persist, including overconfident predictions, susceptibility to adversarial attacks, and underestimation of variability in predictions. The Bayesian paradigm provides a natural framework to overcome such issues and has become the gold standard for uncertainty estimation with deep models, also providing improved accuracy and a framework for tuning critical hyperparameters. However, exact Bayesian inference is challenging, typically involving variational algorithms that impose strong independence and distributional assumptions. Moreover, existing methods are sensitive to the architectural choice of the network. We address these issues by constructing a relaxed version of the standard feed-forward rectified neural network, and employing Polya-Gamma data augmentation tricks to render a conditionally linear and Gaussian model. Additionally, we use sparsity-promoting priors on the weights of the neural network for data-driven architectural design. To approximate the posterior, we derive a variational inference algorithm that avoids distributional assumptions and independence across layers and is a faster alternative to the usual Markov Chain Monte Carlo schemes.