step function
- North America > United States (0.04)
- Europe > Finland (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
Bayesian Dyadic Trees and Histograms for Regression
Stéphanie van der Pas, Veronika Rockova
Many machine learning tools for regression are based on recursive partitioning of the covariate space into smaller regions, where the regression function can be estimated locally. Among these, regression trees and their ensembles have demonstrated impressive empirical performance. In this work, we shed light on the machinery behind Bayesian variants of these methods.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Israel (0.04)
The Splendors and Miseries of Heavisidisation
Machine Learning (ML) is applicable to scientific problems, i.e. to those which have a well defined answer, only if this answer can be brought to a peculiar form ${\cal G}: X\longrightarrow Z$ with ${\cal G}(\vec x)$ expressed as a combination of iterated Heaviside functions. At present it is far from obvious, if and when such representations exist, what are the obstacles and, if they are absent, what are the ways to convert the known formulas into this form. This gives rise to a program of reformulation of ordinary science in such terms -- which sounds like a strong enhancement of the constructive mathematics approach, only this time it concerns all natural sciences. We describe the first steps on this long way.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Research Report (0.50)
- Workflow (0.48)
- North America > United States (0.04)
- Europe > Finland (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
SCAR: A Characterization Scheme for Multi-Modal Dataset
Su, Ri, Chen, Zhao, Cao, Caleb Chen, Tang, Nan, Chen, Lei
Foundation models exhibit remarkable generalization across diverse tasks, largely driven by the characteristics of their training data. Recent data-centric methods like pruning and compression aim to optimize training but offer limited theoretical insight into how data properties affect generalization, especially the data characteristics in sample scaling. Traditional perspectives further constrain progress by focusing predominantly on data quantity and training efficiency, often overlooking structural aspects of data quality. In this study, we introduce SCAR, a principled scheme for characterizing the intrinsic structural properties of datasets across four key measures: Scale, Coverage, Authenticity, and Richness. Unlike prior data-centric measures, SCAR captures stable characteristics that remain invariant under dataset scaling, providing a robust and general foundation for data understanding. Leveraging these structural properties, we introduce Foundation Data-a minimal subset that preserves the generalization behavior of the full dataset without requiring model-specific retraining. We model single-modality tasks as step functions and estimate the distribution of the foundation data size to capture step-wise generalization bias across modalities in the target multi-modal dataset. Finally, we develop a SCAR-guided data completion strategy based on this generalization bias, which enables efficient, modality-aware expansion of modality-specific characteristics in multimodal datasets. Experiments across diverse multi-modal datasets and model architectures validate the effectiveness of SCAR in predicting data utility and guiding data acquisition. Code is available at https://github.com/McAloma/SCAR.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Three Things to Know about Deep Metric Learning
Patel, Yash, Tolias, Giorgos, Matas, Jiri
This paper addresses supervised deep metric learning for open-set image retrieval, focusing on three key aspects: the loss function, mixup regularization, and model initialization. In deep metric learning, optimizing the retrieval evaluation metric, recall@k, via gradient descent is desirable but challenging due to its non-differentiable nature. To overcome this, we propose a differentiable surrogate loss that is computed on large batches, nearly equivalent to the entire training set. This computationally intensive process is made feasible through an implementation that bypasses the GPU memory limitations. Additionally, we introduce an efficient mixup regularization technique that operates on pairwise scalar similarities, effectively increasing the batch size even further. The training process is further enhanced by initializing the vision encoder using foundational models, which are pre-trained on large-scale datasets. Through a systematic study of these components, we demonstrate that their synergy enables large models to nearly solve popular benchmarks.
Bayesian Dyadic Trees and Histograms for Regression
Stéphanie van der Pas, Veronika Rockova
Many machine learning tools for regression are based on recursive partitioning of the covariate space into smaller regions, where the regression function can be estimated locally. Among these, regression trees and their ensembles have demonstrated impressive empirical performance. In this work, we shed light on the machinery behind Bayesian variants of these methods. In particular, we study Bayesian regression histograms, such as Bayesian dyadic trees, in the simple regression case with just one predictor. We focus on the reconstruction of regression surfaces that are piecewise constant, where the number of jumps is unknown.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Bayes' Power for Explaining In-Context Learning Generalizations
Müller, Samuel, Hollmann, Noah, Hutter, Frank
Traditionally, neural network training has been primarily viewed as an approximation of maximum likelihood estimation (MLE). This interpretation originated in a time when training for multiple epochs on small datasets was common and performance was data bound; but it falls short in the era of large-scale single-epoch trainings ushered in by large self-supervised setups, like language models. In this new setup, performance is compute-bound, but data is readily available. As models became more powerful, in-context learning (ICL), i.e., learning in a single forward-pass based on the context, emerged as one of the dominant paradigms. In this paper, we argue that a more useful interpretation of neural network behavior in this era is as an approximation of the true posterior, as defined by the data-generating process. We demonstrate this interpretations' power for ICL and its usefulness to predict generalizations to previously unseen tasks. We show how models become robust in-context learners by effectively composing knowledge from their training data. We illustrate this with experiments that reveal surprising generalizations, all explicable through the exact posterior. Finally, we show the inherent constraints of the generalization capabilities of posteriors and the limitations of neural networks in approximating these posteriors.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)