glob
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Data Science > Data Mining > Big Data (0.65)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)
SCALAR: Self-Calibrating Adaptive Latent Attention Representation Learning
Abbas, Farwa, Ahmad, Hussain, Szabo, Claudia
High-dimensional, heterogeneous data with complex feature interactions pose significant challenges for traditional predictive modeling approaches. While Projection to Latent Structures (PLS) remains a popular technique, it struggles to model complex non-linear relationships, especially in multivariate systems with high-dimensional correlation structures. This challenge is further compounded by simultaneous interactions across multiple scales, where local processing fails to capture crossgroup dependencies. Additionally, static feature weighting limits adaptability to contextual variations, as it ignores sample-specific relevance. To address these limitations, we propose a novel method that enhances predictive performance through novel architectural innovations. Our architecture introduces an adaptive kernel-based attention mechanism that processes distinct feature groups separately before integration, enabling capture of local patterns while preserving global relationships. Experimental results show substantial improvements in performance metrics, compared to the state-of-the-art methods across diverse datasets.
- Oceania > Australia > South Australia > Adelaide (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Research Report > New Finding (0.66)
- Research Report > Promising Solution (0.54)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Information Technology > Data Science (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)
GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation
Ding, Yuanhao, Arias, Esteban Garces, Li, Meimingwei, Rodemann, Julian, Aßenmacher, Matthias, Chen, Danlu, Fan, Gaojuan, Heumann, Christian, Zhang, Chongsheng
Open-ended text generation faces a critical challenge: balancing coherence with diversity in LLM outputs. While contrastive search-based decoding strategies have emerged to address this trade-off, their practical utility is often limited by hyperparameter dependence and high computational costs. We introduce GUARD, a self-adaptive decoding method that effectively balances these competing objectives through a novel "Glocal" uncertainty-driven framework. GUARD combines global entropy estimates with local entropy deviations to integrate both long-term and short-term uncertainty signals. We demonstrate that our proposed global entropy formulation effectively mitigates abrupt variations in uncertainty, such as sudden overconfidence or high entropy spikes, and provides theoretical guarantees of unbiasedness and consistency. To reduce computational overhead, we incorporate a simple yet effective token-count-based penalty into GUARD. Experimental results demonstrate that GUARD achieves a good balance between text diversity and coherence, while exhibiting substantial improvements in generation speed. In a more nuanced comparison study across different dimensions of text quality, both human and LLM evaluators validated its remarkable performance. Our code is available at https://github.com/YecanLee/GUARD.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Oceania > Australia (0.04)
- Asia > China > Henan Province (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Predicting fermionic densities using a Projected Quantum Kernel method
Perciavalle, Francesco, Plastina, Francesco, Pisarra, Michele, Gullo, Nicola Lo
We use a support vector regressor based on a projected quantum kernel method to predict the density structure of 1D fermionic systems of interest in quantum chemistry and quantum matter. The kernel is built on with the observables of a quantum reservoir implementable with interacting Rydberg atoms. Training and test data of the fermionic system are generated using a Density Functional Theory approach. We test the performance of the method for several Hamiltonian parameters, finding a general common behavior of the error as a function of measurement time. At sufficiently large measurement times, we find that the method outperforms the classical linear kernel method and can be competitive with the radial basis function method.
- Europe > Italy > Calabria (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Monaco (0.04)
Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
Arous, Gérard Ben, Erdogdu, Murat A., Vural, N. Mert, Wu, Denny
We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $y \propto \sum_{j=1}^{r}λ_j σ\left(\langle \boldsymbol{θ_j}, \boldsymbol{x}\rangle\right), \boldsymbol{x} \sim N(0,\boldsymbol{I}_d)$, $σ$ is the 2nd Hermite polynomial, and $\lbrace\boldsymbolθ_j \rbrace_{j=1}^{r} \subset \mathbb{R}^d$ are orthonormal signal directions. We consider the extensive-width regime $r \asymp d^β$ for $β\in [0, 1)$, and assume a power-law decay on the (non-negative) second-layer coefficients $λ_j\asymp j^{-α}$ for $α\geq 0$. We present a sharp analysis of the SGD dynamics in the feature learning regime, for both the population limit and the finite-sample (online) discretization, and derive scaling laws for the prediction risk that highlight the power-law dependencies on the optimization time, sample size, and model width. Our analysis combines a precise characterization of the associated matrix Riccati differential equation with novel matrix monotonicity arguments to establish convergence guarantees for the infinite-dimensional effective dynamics.
- North America > Canada > Ontario > Toronto (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning
Sharthak, Shaurya, Pahalwan, Vinayak, Kamath, Adithya, Shirawalmath, Adarsh
Pretrained language models (LLMs) are often constrained by their fixed tokenization schemes, leading to inefficiencies and performance limitations, particularly for multilingual or specialized applications. This tokenizer lock-in presents significant challenges. standard methods to overcome this often require prohibitive computational resources. Although tokenizer replacement with heuristic initialization aims to reduce this burden, existing methods often require exhaustive residual fine-tuning and still may not fully preserve semantic nuances or adequately address the underlying compression inefficiencies. Our framework introduces two innovations: first, Tokenadapt, a model-agnostic tokenizer transplantation method, and second, novel pre-tokenization learning for multi-word Supertokens to enhance compression and reduce fragmentation. Tokenadapt initializes new unique token embeddings via a hybrid heuristic that combines two methods: a local estimate based on subword decomposition using the old tokenizer, and a global estimate utilizing the top-k semantically similar tokens from the original vocabulary. This methodology aims to preserve semantics while significantly minimizing retraining requirements. Empirical investigations validate both contributions: the transplantation heuristic successfully initializes unique tokens, markedly outperforming conventional baselines and sophisticated methods including Transtokenizer and ReTok, while our Supertokens achieve notable compression gains. Our zero-shot perplexity results demonstrate that the TokenAdapt hybrid initialization consistently yields lower perplexity ratios compared to both ReTok and TransTokenizer baselines across different base models and newly trained target tokenizers. TokenAdapt typically reduced the overall perplexity ratio significantly compared to ReTok, yielding at least a 2-fold improvement in these aggregate scores.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Germany > Berlin (0.04)
Fed-Joint: Joint Modeling of Nonlinear Degradation Signals and Failure Events for Remaining Useful Life Prediction using Federated Learning
Jeong, Cheoljoon, Yue, Xubo, Chung, Seokhyun
Many failure mechanisms of machinery are closely related to the behavior of condition monitoring (CM) signals. To achieve a cost-effective preventive maintenance strategy, accurate remaining useful life (RUL) prediction based on the signals is of paramount importance. However, the CM signals are often recorded at different factories and production lines, with limited amounts of data. Unfortunately, these datasets have rarely been shared between the sites due to data confidentiality and ownership issues, a lack of computing and storage power, and high communication costs associated with data transfer between sites and a data center. Another challenge in real applications is that the CM signals are often not explicitly specified \textit{a priori}, meaning that existing methods, which often usually a parametric form, may not be applicable. To address these challenges, we propose a new prognostic framework for RUL prediction using the joint modeling of nonlinear degradation signals and time-to-failure data within a federated learning scheme. The proposed method constructs a nonparametric degradation model using a federated multi-output Gaussian process and then employs a federated survival model to predict failure times and probabilities for in-service machinery. The superiority of the proposed method over other alternatives is demonstrated through comprehensive simulation studies and a case study using turbofan engine degradation signal data that include run-to-failure events.
- North America > United States > Virginia (0.04)
- North America > United States > Michigan (0.04)
- Asia > China (0.04)
Towards Robust Interpretable Surrogates for Optimization
Goerigk, Marc, Hartisch, Michael, Merten, Sebastian
An important factor in the practical implementation of optimization models is the acceptance by the intended users. This is influenced among other factors by the interpretability of the solution process. Decision rules that meet this requirement can be generated using the framework for inherently interpretable optimization models. In practice, there is often uncertainty about the parameters of an optimization problem. An established way to deal with this challenge is the concept of robust optimization. The goal of our work is to combine both concepts: to create decision trees as surrogates for the optimization process that are more robust to perturbations and still inherently interpretable. For this purpose we present suitable models based on different variants to model uncertainty, and solution methods. Furthermore, the applicability of heuristic methods to perform this task is evaluated. Both approaches are compared with the existing framework for inherently interpretable optimization models.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Germany (0.04)
Variational Bayesian Bow tie Neural Networks with Shrinkage
Despite the dominant role of deep models in machine learning, limitations persist, including overconfident predictions, susceptibility to adversarial attacks, and underestimation of variability in predictions. The Bayesian paradigm provides a natural framework to overcome such issues and has become the gold standard for uncertainty estimation with deep models, also providing improved accuracy and a framework for tuning critical hyperparameters. However, exact Bayesian inference is challenging, typically involving variational algorithms that impose strong independence and distributional assumptions. Moreover, existing methods are sensitive to the architectural choice of the network. We address these issues by constructing a relaxed version of the standard feed-forward rectified neural network, and employing Polya-Gamma data augmentation tricks to render a conditionally linear and Gaussian model. Additionally, we use sparsity-promoting priors on the weights of the neural network for data-driven architectural design. To approximate the posterior, we derive a variational inference algorithm that avoids distributional assumptions and independence across layers and is a faster alternative to the usual Markov Chain Monte Carlo schemes.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- (3 more...)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.87)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)