ker
The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights
Clara, Gabriel, Mash'al, Yazan
We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descent, importance sampling, but also extends to weighting distributions with arbitrary continuous values, thereby providing a unified framework to analyze the impact of various kinds of noise on the training trajectory. We characterize the implicit regularization induced through the random weighting, connect it with weighted linear regression, and derive non-asymptotic bounds for convergence in first and second moments. Leveraging geometric moment contraction, we also investigate the stationary distribution induced by the added noise. Based on these results, we discuss how specific choices of weighting distribution influence both the underlying optimization problem and statistical properties of the resulting estimator, as well as some examples for which weightings that lead to fast convergence cause bad statistical performance.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
- Europe > Germany (0.05)
- South America > Argentina (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements
Halder, Vishal, Reiffers-Masson, Alexandre, Aïssa-El-Bey, Abdeldjalil, Thoppe, Gugan
Let $A \in \mathbb{R}^{m \times n}$ be an arbitrary, known matrix and $e$ a $q$-sparse adversarial vector. Given $y = A x^\star + e$ and $q$, we seek the smallest set containing $x^\star$ -- hence the one conveying maximal information about $x^\star$ -- that is uniformly recoverable from $y$ without knowing $e$. While exact recovery of $x^\star$ via strong (and often impractical) structural assumptions on $A$ or $x^\star$ (e.g., restricted isometry, sparsity) is well studied, recoverability for arbitrary $A$ and $x^\star$ remains open. Our main result shows that the best that one can hope to recover is $x^\star + \ker(U)$, where $U$ is the unique projection matrix onto the intersection of rowspaces of all possible submatrices of $A$ obtained by deleting $2q$ rows. Moreover, we prove that every $x$ that minimizes the $\ell_0$-norm of $y - A x$ lies in $x^\star + \ker(U)$, which then gives a constructive approach to recover this set.
- Information Technology > Security & Privacy (0.46)
- Energy > Power Industry (0.46)
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
Hashimoto, Yuka, Sonoda, Sho, Ishikawa, Isao, Ikeda, Masahiro
We derive a new Rademacher complexity bound for deep neural networks using Koopman operators, group representations, and reproducing kernel Hilbert spaces (RKHSs). The proposed bound describes why the models with high-rank weight matrices generalize well. Although there are existing bounds that attempt to describe this phenomenon, these existing bounds can be applied to limited types of models. We introduce an algebraic representation of neural networks and a kernel function to construct an RKHS to derive a bound for a wider range of realistic models. This work paves the way for the Koopman-based theory for Rademacher complexity bounds to be valid for more practical situations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Control Disturbance Rejection in Neural ODEs
Bayram, Erkan, Belabbas, Mohamed-Ali, Başar, Tamer
In this paper, we propose an iterative training algorithm for Neural ODEs that provides models resilient to control (parameter) disturbances. The method builds on our earlier work Tuning without Forgetting-and similarly introduces training points sequentially, and updates the parameters on new data within the space of parameters that do not decrease performance on the previously learned training points-with the key difference that, inspired by the concept of flat minima, we solve a minimax problem for a non-convex non-concave functional over an infinite-dimensional control space. We develop a projected gradient descent algorithm on the space of parameters that admits the structure of an infinite-dimensional Banach subspace. We show through simulations that this formulation enables the model to effectively learn new data points and gain robustness against control disturbance.
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > New York (0.04)
The Measure of Deception: An Analysis of Data Forging in Machine Unlearning
Dixit, Rishabh, Hui, Yuan, Saab, Rayan
Motivated by privacy regulations and the need to mitigate the effects of harmful data, machine unlearning seeks to modify trained models so that they effectively ``forget'' designated data. A key challenge in verifying unlearning is forging -- adversarially crafting data that mimics the gradient of a target point, thereby creating the appearance of unlearning without actually removing information. To capture this phenomenon, we consider the collection of data points whose gradients approximate a target gradient within tolerance $ε$ -- which we call an $ε$-forging set -- and develop a framework for its analysis. For linear regression and one-layer neural networks, we show that the Lebesgue measure of this set is small. It scales on the order of $ε$, and when $ε$ is small enough, $ε^d$. More generally, under mild regularity assumptions, we prove that the forging set measure decays as $ε^{(d-r)/2}$, where $d$ is the data dimension and $r
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Czechia > Prague (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
A Harmonic Space Proofs
We prove only one direction. Since this is a constant vector, the two nodes are not separable in the diffusion limit.We state the following result without a proof (see Exercise 4.1 in Bishop [6]). If the sheaf has a trivial global section, all features converge to zero in the diffusion limit. The result follows from applying the trigonometric identity cos( π/ 2 + x) = sin x . Idea: We can use rotation matrices to align the harmonic features of the classes with the axis of coordinates as in Figure 6a. It can be checked that these matrices respect the properties outlined above.
Zero-Direction Probing: A Linear-Algebraic Framework for Deep Analysis of Large-Language-Model Drift
We present Zero-Direction Probing (ZDP), a theory-only framework for detecting model drift from null directions of transformer activations without task labels or output evaluations. Under assumptions A1--A6, we prove: (i) the Variance--Leak Theorem, (ii) Fisher Null-Conservation, (iii) a Rank--Leak bound for low-rank updates, and (iv) a logarithmic-regret guarantee for online null-space trackers. We derive a Spectral Null-Leakage (SNL) metric with non-asymptotic tail bounds and a concentration inequality, yielding a-priori thresholds for drift under a Gaussian null model. These results show that monitoring right/left null spaces of layer activations and their Fisher geometry provides concrete, testable guarantees on representational change.