Asia
Momentum Further Constrains Sharpness at the Edge of Stochastic Stability
Andreyev, Arseniy, Ananthkumar, Advikar, Walden, Marc, Poggio, Tomaso, Beneventano, Pierfrancesco
Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-β)/η$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+β)/η$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.
Identifiability of Potentially Degenerate Gaussian Mixture Models With Piecewise Affine Mixing
Xu, Danru, Lachapelle, Sébastien, Magliacane, Sara
Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a potentially degenerate Gaussian mixture distribution and that are only observed through the transformation via a piecewise affine mixing function. We provide a series of progressively stronger identifiability results for this challenging setting in which the probability density functions are ill-defined because of the potential degeneracy. For identifiability up to permutation and scaling, we leverage a sparsity regularization on the learned representation. Based on our theoretical results, we propose a two-stage method to estimate the latent variables by enforcing sparsity and Gaussianity in the learned representations. Experiments on synthetic and image data highlight our method's effectiveness in recovering the ground-truth latent variables.
A short proof of near-linear convergence of adaptive gradient descent under fourth-order growth and convexity
Davis, Damek, Drusvyatskiy, Dmitriy
Davis, Drusvyatskiy, and Jiang showed that gradient descent with an adaptive stepsize converges locally at a nearly-linear rate for smooth functions that grow at least quartically away from their minimizers. The argument is intricate, relying on monitoring the performance of the algorithm relative to a certain manifold of slow growth -- called the ravine. In this work, we provide a direct Lyapunov-based argument that bypasses these difficulties when the objective is in addition convex and a has a unique minimizer. As a byproduct of the argument, we obtain a more adaptive variant than the original algorithm with encouraging numerical performance.
Grayson Perry Has Seen the Future review – some of these insights into AI are just mindblowing
Intelligent, egoless the artist in Grayson Perry Has Seen the Future. Intelligent, egoless the artist in Grayson Perry Has Seen the Future. From people marrying digital companions to CEOs excited about how people whose jobs are replaced can'adapt', this is terrifying watching. T here is a fun game you can play while watching Grayson Perry Has Seen the Future, the two-part documentary presented by the artist on the subject of artificial intelligence, its uses and its possible ramifications. Gather a group of friends, press play, and see which of you loses your mind first.
Visit a WWII destroyer without leaving your sofa
The USS Cassin Young is one of the last of the war's Fletcher-class destroyers. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The USS Cassin Young is one of four remaining Fletcher-class destroyers still afloat. Breakthroughs, discoveries, and DIY tips sent six days a week. Although its name may not sound immediately familiar, the over 360-foot-long ship's recognizable silhouette remains a hallmark example of World War II imagery.
Monkeys walk around a virtual world using only their thoughts
Researchers hope the experiments will pave the way for people with paralysis to explore virtual worlds or more intuitively control electric wheelchairs in this one. Peter Janssen at KU Leuven in Belgium and colleagues implanted three rhesus macaque ( Macaca mulatta) monkeys with BCIs. Crucially, each animal got three implants, each consisting of 96 electrodes, positioned in the primary motor, dorsal and ventral premotor cortex. The first area is commonly used in BCI research and relates to physical movement, but the latter two are thought to be involved in planning movement in a higher, more abstract way. Electrical signals from the implants were then interpreted by an AI model and used to control VR avatars as the monkeys watched a 3D monitor.
50,000 illegal shark fins found inside fake car part boxes
The poached ingredients worth $1.3 million were seized in a nationwide hunt. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Shark fins remain a prized delicacy despite conservation efforts and education. Breakthroughs, discoveries, and DIY tips sent six days a week. The United States Fish and Wildlife Service (FWS) recently exposed a major international smuggling operation orchestrated across at least three cities around the country.
Discrete Flow Maps
Potaptchik, Peter, Yim, Jason, Saravanan, Adhi, Holderrieth, Peter, Vanden-Eijnden, Eric, Albergo, Michael S.
The sequential nature of autoregressive next-token prediction imposes a fundamental speed limit on large language models. While continuous flow models offer a path to parallel generation, they traditionally demand expensive iterative integration. Flow Maps bypass this bottleneck by compressing generative trajectories into single-step mappings, theoretically enabling the generation of full text sequences from noise in a single forward pass. However, standard formulations rely on Euclidean regression losses that are geometrically ill-suited for discrete data. In this work, we resolve this conflict with Discrete Flow Maps, a framework that reconciles trajectory compression with the geometry of the probability simplex. We recast standard flow map training for the discrete domain, aligning the training dynamics with the discrete nature of language. Empirically, this strict geometric alignment allows our method to surpass previous state-of-the-art results in discrete flow modeling.
A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data
Iwashita, Yuichiro, Abbasi, Ahtisham Fazeel, Kise, Koichi, Dengel, Andreas, Asim, Muhammad Nabeel
Background: Single-cell RNA sequencing (scRNA-seq) enables gene expression profiling at cellular resolution but is inherently affected by sparsity caused by dropout events, where expressed genes are recorded as zeros due to technical limitations. These artifacts distort gene expression distributions and compromise downstream analyses. Numerous imputation methods have been proposed to recover latent transcriptional signals. These methods range from traditional statistical models to deep learning (DL)-based methods. However, their comparative performance remains unclear, as existing benchmarks evaluate only a limited subset of methods, datasets, and downstream analyses. Results: We present a comprehensive benchmark of 15 scRNA-seq imputation methods spanning 7 methodological categories, including traditional and DL-based methods. Methods are evaluated across 30 datasets from 10 experimental protocols on 6 downstream analyses. Results show that traditional methods, such as model-based, smoothing-based, and low-rank matrix-based methods, generally outperform DL-based methods, including diffusion-based, GAN-based, GNN-based, and autoencoder-based methods. In addition, strong performance in numerical gene expression recovery does not necessarily translate into improved biological interpretability in downstream analyses, including cell clustering, differential expression analysis, marker gene analysis, trajectory analysis, and cell type annotation. Furthermore, method performance varies substantially across datasets, protocols, and downstream analyses, with no single method consistently outperforming others. Conclusions: Our findings provide practical guidance for selecting imputation methods tailored to specific analytical objectives and underscore the importance of task-specific evaluation when assessing imputation performance in scRNA-seq data analysis.
Offline-Online Reinforcement Learning for Linear Mixture MDPs
Zhang, Zhongjun, Sinclair, Sean R.
We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may come from a mismatched environment, while in the online phase the learner interacts with the target environment. We propose an algorithm that adaptively leverages offline data. When the offline data are informative, either due to sufficient coverage or small environment shift, the algorithm provably improves over purely online learning. When the offline data are uninformative, it safely ignores them and matches the online-only performance. We establish regret upper bounds that explicitly characterize when offline data are beneficial, together with nearly matching lower bounds. Numerical experiments further corroborate our theoretical findings.