Not enough data to create a plot.
Try a different view from the menu above.
When do Random Forests work?
Revelas, C., Boldea, O., Werker, B. J. M.
We study the effectiveness of randomizing split-directions in random forests. Prior literature has shown that, on the one hand, randomization can reduce variance through decorrelation, and, on the other hand, randomization regularizes and works in low signal-to-noise ratio (SNR) environments. First, we bring together and revisit decorrelation and regularization by presenting a systematic analysis of out-of-sample mean-squared error (MSE) for different SNR scenarios based on commonly-used data-generating processes. We find that variance reduction tends to increase with the SNR and forests outperform bagging when the SNR is low because, in low SNR cases, variance dominates bias for both methods. Second, we show that the effectiveness of randomization is a question that goes beyond the SNR. We present a simulation study with fixed and moderate SNR, in which we examine the effectiveness of randomization for other data characteristics. In particular, we find that (i) randomization can increase bias in the presence of fat tails in the distribution of covariates; (ii) in the presence of irrelevant covariates randomization is ineffective because bias dominates variance; and (iii) when covariates are mutually correlated randomization tends to be effective because variance dominates bias. Beyond randomization, we find that, for both bagging and random forests, bias can be significantly reduced in the presence of correlated covariates. This last finding goes beyond the prevailing view that averaging mostly works by variance reduction. Given that in practice covariates are often correlated, our findings on correlated covariates could open the way for a better understanding of why random forests work well in many applications.
Spectral Algorithms under Covariate Shift
Fan, Jun, Guo, Zheng-Chu, Shi, Lei
Spectral algorithms leverage spectral regularization techniques to analyze and process data, providing a flexible framework for addressing supervised learning problems. To deepen our understanding of their performance in real-world scenarios where the distributions of training and test data may differ, we conduct a rigorous investigation into the convergence behavior of spectral algorithms under distribution shifts, specifically within the framework of reproducing kernel Hilbert spaces. Our study focuses on the case of covariate shift. In this scenario, the marginal distributions of the input data differ between the training and test datasets, while the conditional distribution of the output given the input remains unchanged. Under this setting, we analyze the generalization error of spectral algorithms and show that they achieve minimax optimality when the density ratios between the training and test distributions are uniformly bounded. However, we also identify a critical limitation: when the density ratios are unbounded, the spectral algorithms may become suboptimal. To address this limitation, we propose a weighted spectral algorithm that incorporates density ratio information into the learning process. Our theoretical analysis shows that this weighted approach achieves optimal capacity-independent convergence rates. Furthermore, by introducing a weight clipping technique, we demonstrate that the convergence rates of the weighted spectral algorithm can approach the optimal capacity-dependent convergence rates arbitrarily closely. This improvement resolves the suboptimality issue in unbounded density ratio scenarios and advances the state-of-the-art by refining existing theoretical results.
Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
Glasgow, Margalit, Wu, Denny, Bruna, Joan
We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessian of each particle, defined as the derivative of the particle's velocity in the mean-field dynamics with respect to its position. We apply our results to the canonical feature learning problem of estimating a well-specified single-index model; we permit the information exponent to be arbitrarily large, leading to convergence times that grow polynomially in the ambient dimension $d$. We show that, due to a certain ``self-concordance'' property in these problems -- where the local Hessian of a particle is bounded by a constant times the particle's velocity -- polynomially many neurons are sufficient to closely approximate the mean-field dynamics throughout training.
Variance-Reduced Fast Operator Splitting Methods for Stochastic Generalized Equations
We develop two classes of variance-reduced fast operator splitting methods to approximate solutions of both finite-sum and stochastic generalized equations. Our approach integrates recent advances in accelerated fixed-point methods, co-hypomonotonicity, and variance reduction. First, we introduce a class of variance-reduced estimators and establish their variance-reduction bounds. This class covers both unbiased and biased instances and comprises common estimators as special cases, including SVRG, SAGA, SARAH, and Hybrid-SGD. Next, we design a novel accelerated variance-reduced forward-backward splitting (FBS) algorithm using these estimators to solve finite-sum and stochastic generalized equations. Our method achieves both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ convergence rates on the expected squared norm $\mathbb{E}[ \| G_{\lambda}x^k\|^2]$ of the FBS residual $G_{\lambda}$, where $k$ is the iteration counter. Additionally, we establish, for the first time, almost sure convergence rates and almost sure convergence of iterates to a solution in stochastic accelerated methods. Unlike existing stochastic fixed-point algorithms, our methods accommodate co-hypomonotone operators, which potentially include nonmonotone problems arising from recent applications. We further specify our method to derive an appropriate variant for each stochastic estimator -- SVRG, SAGA, SARAH, and Hybrid-SGD -- demonstrating that they achieve the best-known complexity for each without relying on enhancement techniques. Alternatively, we propose an accelerated variance-reduced backward-forward splitting (BFS) method, which attains similar convergence rates and oracle complexity as our FBS method. Finally, we validate our results through several numerical experiments and compare their performance.
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Reizinger, Patrik, Balestriero, Randall, Klindt, David, Brendel, Wieland
Self-Supervised Learning (SSL) powers many current AI systems. As research interest and investment grow, the SSL design space continues to expand. The Platonic view of SSL, following the Platonic Representation Hypothesis (PRH), suggests that despite different methods and engineering approaches, all representations converge to the same Platonic ideal. However, this phenomenon lacks precise theoretical explanation. By synthesizing evidence from Identifiability Theory (IT), we show that the PRH can emerge in SSL. However, current IT cannot explain SSL's empirical success. To bridge the gap between theory and practice, we propose expanding IT into what we term Singular Identifiability Theory (SITh), a broader theoretical framework encompassing the entire SSL pipeline. SITh would allow deeper insights into the implicit data assumptions in SSL and advance the field towards learning more interpretable and generalizable representations. We highlight three critical directions for future research: 1) training dynamics and convergence properties of SSL; 2) the impact of finite samples, batch size, and data diversity; and 3) the role of inductive biases in architecture, augmentations, initialization schemes, and optimizers.
Cluster weighted models with multivariate skewed distributions for functional data
Anton, Cristina, Shreshtth, Roy Shivam Ram
Cluster weighted models with multivariate skewed distributions for functional data Cristina Anton, 1 Roy Shivam Ram Shreshtth 2 1 Department of Mathematics and Statistics, MacEwan University, 103C, 10700-104 Ave., Edmonton, AB T5J 4S2, Canada, email: popescuc@macewan.ca 2 Department of Mathematics and Statistics, Indian Institute of Technology Kanpur Abstract We propose a clustering method, funWeightClustSkew, based on mixtures of functional linear regression models and three skewed multivariate distributions: the variance-gamma distribution, the skew-t distribution, and the normal-inverse Gaussian distribution. Our approach follows the framework of the functional high dimensional data clustering (funHDDC) method, and we extend to functional data the cluster weighted models based on skewed distributions used for finite dimensional multivariate data. We consider several parsimonious models, and to estimate the parameters we construct an expectation maximization (EM) algorithm. We illustrate the performance of funWeightClustSkew for simulated data and for the Air Quality dataset. Keywords: Cluster weighted models, Functional linear regression, EM algorithm, Skewed distributions, Multivariate functional principal component analysis 1 Introduction Smart devices and other modern technologies record huge amounts of data measured continuously in time. These data are better represented as curves instead of finite-dimensional vectors, and they are analyzed using statistical methods specific to functional data (Ramsay and Silverman, 2006; Ferraty and Vieu, 2006; Horv ath and Kokoszka, 2012). Many times more than one curve is collected for one individual, e.g.
ALT: A Python Package for Lightweight Feature Representation in Time Series Classification
Halmos, Balรกzs P., Hajรณs, Balรกzs, Molnรกr, Vince ร., Kurbucz, Marcell T., Jakovรกc, Antal
We introduce ALT, an open-source Python package created for efficient and accurate time series classification (TSC). The package implements the adaptive law-based transformation (ALT) algorithm, which transforms raw time series data into a linearly separable feature space using variable-length shifted time windows. This adaptive approach enhances its predecessor, the linear law-based transformation (LLT), by effectively capturing patterns of varying temporal scales. The software is implemented for scalability, interpretability, and ease of use, achieving state-of-the-art performance with minimal computational overhead. Extensive benchmarking on real-world datasets demonstrates the utility of ALT for diverse TSC tasks in physics and related domains.
DeepSeek poses 'profound' security threat, U.S. house panel claims
Chinese artificial intelligence firm DeepSeek is a "profound threat" to U.S. national security, a bipartisan House committee said Wednesday, urging Nvidia to hand over information on sales of chips that the startup may have used to develop its breakthrough chatbot model. The House Select Committee on China alleged in a report Wednesday that DeepSeek's ties to Chinese government interests "are significant," citing corporate filings obtained by the panel. Lawmakers claimed that DeepSeek's founder, Liang Wenfeng, controls the firm alongside the High-Flyer Quant hedge fund in an "integrated ecosystem" linked to state-linked hardware distributors and Chinese research institute Zhejiang Lab. "Although it presents itself as just another AI chatbot, offering users a way to generate text and answer questions, closer inspection reveals that the app siphons data back to the People's Republic of China (PRC), creates security vulnerabilities for its users, and relies on a model that covertly censors and manipulates information pursuant to Chinese law," the report states.
The latest ChatGPT trend? People are using it to turn their pets into humans.
Since ChatGPT's AI image generator launched to free users a couple of weeks ago, people online have been toying around with its possibilities. The latest trend involves people anthropomorphizing their pets. To be clear, we all kind of act like our pets are humans -- my perfect dog, Henry, is a good boy who loves deeply like a real person. But folks are now using ChatGPT's image generator to imagine what their pets would actually look like as people. If you search around other social platforms, you'll find more examples of people who said they used ChatGPT to transform their pets into humans.
Jimmi Simpson on Cast Aways unexpected influence in Black Mirror: USS Callister: Into Infinity
With Season 7 of Black Mirror, series superfan Jimmi Simpson returned to reprise the roles of video game billionaire James Walton and his less vicious in-game clone in "USS Callister: Into Infinity." Mashable Entertainment Editor Kristy Puchko sat down with Simpson over Zoom for an interview about what it was like to come back to Black Mirror. And beyond that, what was it like to go full-on Cast Away with a touch of Oldboy? Sure, Clone Walton was thought to be dead after his heroic self-sacrifice at the end of "USS Callister." However, "Into Infinity" reveals that he'd be respawned on an unpopulated planet in the online MMORPG Infinity.