basin
Where Do Large Learning Rates Lead Us?
It is generally accepted that starting neural networks training with large learning rates (LRs) improves generalization. Following a line of research devoted to understanding this effect, we conduct an empirical study in a controlled setting focusing on two questions: 1) how large an initial LR is required for obtaining optimal quality, and 2) what are the key differences between models trained with different LRs? We discover that only a narrow range of initial LRs slightly above the convergence threshold lead to optimal results after fine-tuning with a small LR or weight averaging. By studying the local geometry of reached minima, we observe that using LRs from this optimal range allows for the optimization to locate a basin that only contains high-quality minima. Additionally, we show that these initial LRs result in a sparse set of learned features, with a clear focus on those most relevant for the task. In contrast, starting training with too small LRs leads to unstable minima and attempts to learn all features simultaneously, resulting in poor generalization. Conversely, using initial LRs that are too large fails to detect a basin with good solutions and extract meaningful patterns from the data.
Tensor decompositions of higher-order correlations by nonlinear Hebbian plasticity
Biological synaptic plasticity exhibits nonlinearities that are not accounted for by classic Hebbian learning rules. Here, we introduce a simple family of generalized nonlinear Hebbian learning rules. We study the computations implemented by their dynamics in the simple setting of a neuron receiving feedforward inputs. These nonlinear Hebbian rules allow a neuron to learn tensor decompositions of its higher-order input correlations. The particular input correlation decomposed and the form of the decomposition depend on the location of nonlinearities in the plasticity rule.
ByteStorm: a multi-step data-driven approach for Tropical Cyclones detection and tracking
Donno, Davide, Elia, Donatello, Accarino, Gabriele, De Carlo, Marco, Scoccimarro, Enrico, Gualdi, Silvio
Accurate tropical cyclones (TCs) tracking represents a critical challenge in the context of weather and climate science. Traditional tracking schemes mainly rely on subjective thresholds, which may introduce biases in their skills on the geographical region of application. We present ByteStorm, an efficient data-driven framework for reconstructing TC tracks without threshold tuning. It leverages deep learning networks to detect TC centers (via classification and localization), using only relative vorticity (850 mb) and mean sea-level pressure. Then, detected centers are linked into TC tracks through the BYTE algorithm. ByteStorm is evaluated against state-of-the-art deterministic trackers in the East- and West-North Pacific basins (ENP and WNP). The proposed framework achieves superior performance in terms of Probability of Detection ($85.05\%$ ENP, $79.48\%$ WNP), False Alarm Rate ($23.26\%$ ENP, $16.14\%$ WNP), and high Inter-Annual Variability correlations ($0.75$ ENP and $0.69$ WNP). These results highlight the potential of integrating deep learning and computer vision for fast and accurate TC tracking, offering a robust alternative to traditional approaches.
- Europe > Italy (0.04)
- Asia > Japan (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (7 more...)
How to Tame Your LLM: Semantic Collapse in Continuous Systems
We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,μ) \to L^2(M,μ)$ encodes the propagation of semantic mass. Under mild regularity assumptions (compactness, ergodicity, bounded Jacobian), $P$ is compact with discrete spectrum. Within this setting, we prove the Semantic Characterization Theorem (SCT): the leading eigenfunctions of $P$ induce finitely many spectral basins of invariant meaning, each definable in an o-minimal structure over $\mathbb{R}$. Thus spectral lumpability and logical tameness coincide. This explains how discrete symbolic semantics can emerge from continuous computation: the continuous activation manifold collapses into a finite, logically interpretable ontology. We further extend the SCT to stochastic and adiabatic (time-inhomogeneous) settings, showing that slowly drifting kernels preserve compactness, spectral coherence, and basin structure.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
- North America > United States > Illinois (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- (2 more...)
Supplementary Material and Datasheet: Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting Contents
This supplementary document follows the Datasheets for Datasets template of (8) to document the Global Flood Forecasting (GFF) dataset and its creation. Further resources are provided: in the accompanying publication https://arxiv.org/abs/2409.18591 in the GitHub repository https://github.com/Multihuntr/GFF
- North America > United States > Alaska (0.04)
- North America > United States > Colorado (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Africa (0.04)
- Law (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (0.46)
- North America > United States > Alaska (0.04)
- North America > United States > Texas > Schleicher County (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Government (0.68)
- Information Technology > Services (0.46)
From Global to Local Correlation: Geometric Decomposition of Statistical Inference
Understanding feature-outcome associations in high-dimensional data remains challenging when relationships vary across subpopulations, yet standard methods assuming global associations miss context-dependent patterns, reducing statistical power and interpretability. We develop a geometric decomposition framework offering two strategies for partitioning inference problems into regional analyses on data-derived Riemannian graphs. Gradient flow decomposition uses path-monotonicity-validated discrete Morse theory to partition samples into gradient flow cells where outcomes exhibit monotonic behavior. Co-monotonicity decomposition utilizes vertex-level coefficients that provide context-dependent versions of the classical Pearson correlation: these coefficients measure edge-based directional concordance between outcome and features, or between feature pairs, defining embeddings of samples into association space. These embeddings induce Riemannian k-NN graphs on which biclustering identifies co-monotonicity cells (coherent regions) and feature modules. This extends naturally to multi-modal integration across multiple feature sets. Both strategies apply independently or jointly, with Bayesian posterior sampling providing credible intervals.
- North America > United States > Virginia (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New York (0.04)
- (3 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)