Goto

Collaborating Authors

 efp


Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

arXiv.org Machine Learning

Deep learning has shown remarkable effectiveness in high-dimensional approximation problems, particularly in scientific computing, inverse problems, and operator learning (Han et al., 2018; Adcock et al., 2022; Beck et al., 2023). In many such settings, the ReLUs activation σs(t) = max{0,t}s (s N0) is especially relevant because it yields piecewisepolynomial representations that are well suited to smooth targets and derivative-sensitive tasks (Yang and Zhou, 2025; He et al., 2024).


Scaling Laws in Jet Classification

arXiv.org Artificial Intelligence

We demonstrate the emergence of scaling laws in the benchmark top versus QCD jet classification problem in collider physics. Six distinct physically-motivated classifiers exhibit power-law scaling of the binary cross-entropy test loss as a function of training set size, with distinct power law indices. This result highlights the importance of comparing classifiers as a function of dataset size rather than for a fixed training set, as the optimal classifier may change considerably as the dataset is scaled up. We speculate on the interpretation of our results in terms of previous models of scaling laws observed in natural language and image datasets.


Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

arXiv.org Artificial Intelligence

The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime. In this work, we provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure. We establish quantitative global convergence guarantees for both the continuous-time and discrete-time dynamics based on properties of a proximal Gibbs measure introduced in Nitanda et al. (2022). Furthermore, our primal-dual framework entails a memory-efficient particle-based implementation of the EFP update, and also suggests a connection to gradient boosting methods. We illustrate the efficiency of our novel implementation in experiments including neural network optimization and image synthesis.


Feature Selection with Distance Correlation

arXiv.org Artificial Intelligence

Choosing which properties of the data to use as input to multivariate decision algorithms -- a.k.a. feature selection -- is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on Distance Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted top- and $W$-tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.


A multiple testing framework for diagnostic accuracy studies with co-primary endpoints

arXiv.org Machine Learning

This is indicated, among others, by several review and overview publications (Ching et al., 2018; Jiang et al., 2017; Litjens et al., 2017; Miotto, Wang, Wang, Jiang, & Dudley, 2017). In particular, the capabilities of end-to-end deep learning approaches on such supervised learning tasks are highly promising. For instance, vast advances have been reported in the literature regarding cancer diagnosis with deep neural networks (Hu et al., 2018). End-to-end deep learning refers to a trend involving deep (neural network) model architectures which are able to learn highly complex relationships between predictors and the target variable while having less parameters than traditional (more shallow) models with comparable performance (Goodfellow, Bengio, & Courville, 2016). In the training process, highly complex features are derived automatically by the learning algorithm (LeCun, Bengio, & Hinton, 2015). This framework contrasts the traditional pipeline of domain specific data preprocessing and handcrafted features in combination with simpler prediction models. Despite all the recent success of machine learning, there are still challenges regarding over-optimistic conclusions drawn from finite datasets which may to a large extend be attributed to the following two (broad) categories: 1. Study design and reporting: The most popular recommendation to split data for training, selection and evaluation is frequently employed in practice (Friedman, Hastie, & Tibshirani, 2009; Géron, 2017; Goodfellow et al., 2016; Japkowicz & Shah, 2011; Kuhn & Johnson, 2013; Zheng, 2015). In the ML community, the according datasets are commonly denoted as training, validation and test set.