AITopics | Mollenhauer, Mattes

Collaborating Authors

Mollenhauer, Mattes

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Choose Your Model Size: Any Compression by a Single Gradient Descent

Genzel, Martin, Putzky, Patrick, Zhao, Pengfei, Schulze, Sebastian, Mollenhauer, Mattes, Seidel, Robert, Dietzel, Stefan, Wollmann, Thomas

arXiv.org Artificial IntelligenceFeb-3-2025

The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To ensure parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. The resulting pruning order gives rise to a global parameter ranking that allows us to materialize models of any target size. Importantly, the compressed models exhibit strong predictive downstream performance without the need for costly fine-tuning. We evaluate ACIP on a large selection of open-weight LLMs and tasks, and demonstrate state-of-the-art results compared to existing factorisation-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.01717

Country: Europe > Germany (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms

Meunier, Dimitri, Shen, Zikai, Mollenhauer, Mattes, Gretton, Arthur, Li, Zhu

arXiv.org Machine LearningMay-23-2024

We study theoretical properties of a broad class of regularized algorithms with vector-valued output. These spectral algorithms include kernel ridge regression, kernel principal component regression, various implementations of gradient descent and many more. Our contributions are twofold. First, we rigorously confirm the so-called saturation effect for ridge regression with vector-valued output by deriving a novel lower bound on learning rates; this bound is shown to be suboptimal when the smoothness of the regression function exceeds a certain level. Second, we present the upper bound for the finite sample risk general vector-valued spectral algorithms, applicable to both well-specified and misspecified scenarios (where the true regression function lies outside of the hypothesis space) which is minimax optimal in various regimes. All of our results explicitly allow the case of infinite-dimensional output variables, proving consistency of recent practical applications.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

2405.14778

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm

Li, Zhu, Meunier, Dimitri, Mollenhauer, Mattes, Gretton, Arthur

arXiv.org Machine LearningDec-13-2023

We present the first optimal rates for infinite-dimensional vector-valued ridge regression on a continuous scale of norms that interpolate between $L_2$ and the hypothesis space, which we consider as a vector-valued reproducing kernel Hilbert space. These rates allow to treat the misspecified case in which the true regression function is not contained in the hypothesis space. We combine standard assumptions on the capacity of the hypothesis space with a novel tensor product construction of vector-valued interpolation spaces in order to characterize the smoothness of the regression function. Our upper bound not only attains the same rate as real-valued kernel ridge regression, but also removes the assumption that the target regression function is bounded. For the lower bound, we reduce the problem to the scalar setting using a projection argument. We show that these rates are optimal in most cases and independent of the dimension of the output space. We illustrate our results for the special case of vector-valued Sobolev spaces.

artificial intelligence, fischer and steinwart, machine learning, (16 more...)

arXiv.org Machine Learning

2312.07186

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Optimal Rates for Regularized Conditional Mean Embedding Learning

Li, Zhu, Meunier, Dimitri, Mollenhauer, Mattes, Gretton, Arthur

arXiv.org Machine LearningDec-12-2023

We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of $Y$ given $X$ into a target reproducing kernel Hilbert space $\mathcal{H}_Y$. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal $O(\log n / n)$ rates without assuming $\mathcal{H}_Y$ to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2208.01711

Country: North America > United States > Virginia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem

Mollenhauer, Mattes, Mücke, Nicole, Sullivan, T. J.

arXiv.org Machine LearningMar-6-2023

We consider the problem of learning a linear operator $\theta$ between two Hilbert spaces from empirical observations, which we interpret as least squares regression in infinite dimensions. We show that this goal can be reformulated as an inverse problem for $\theta$ with the undesirable feature that its forward operator is generally non-compact (even if $\theta$ is assumed to be compact or of $p$-Schatten class). However, we prove that, in terms of spectral properties and regularisation theory, this inverse problem is equivalent to the known compact inverse problem associated with scalar response regression. Our framework allows for the elegant derivation of dimension-free rates for generic learning algorithms under H\"older-type source conditions. The proofs rely on the combination of techniques from kernel regression with recent results on concentration of measure for sub-exponential Hilbertian random variables. The obtained rates hold for a variety of practically-relevant scenarios in functional regression as well as nonlinear regression with operator-valued kernels and match those of classical kernel regression with scalar response.

artificial intelligence, machine learning, operator, (19 more...)

arXiv.org Machine Learning

2211.08875

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Nonparametric approximation of conditional expectation operators

Mollenhauer, Mattes, Koltai, Péter

arXiv.org Machine LearningJan-7-2021

Given the joint distribution of two random variables $X,Y$ on some second countable locally compact Hausdorff space, we investigate the statistical approximation of the $L^2$-operator defined by $[Pf](x) := \mathbb{E}[ f(Y) \mid X = x ]$ under minimal assumptions. By modifying its domain, we prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert--Schmidt operators acting on a reproducing kernel Hilbert space. This fact allows to estimate $P$ uniformly by finite-rank operators over a dense subspace even when $P$ is not compact. In terms of modes of convergence, we thereby obtain the superiority of kernel-based techniques over classically used parametric projection approaches such as Galerkin methods. This also provides a novel perspective on which limiting object the nonparametric estimate of $P$ converges to. As an application, we show that these results are particularly important for a large family of spectral analysis techniques for Markov transition operators. Our investigation also gives a new asymptotic perspective on the so-called kernel conditional mean embedding, which is the theoretical foundation of a wide variety of techniques in kernel-based nonparametric inference.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Machine Learning

2012.12917

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Kernel Conditional Density Operators

Schuster, Ingmar, Mollenhauer, Mattes, Klus, Stefan, Muandet, Krikamol

arXiv.org Machine LearningMay-27-2019

We introduce a conditional density estimation model termed the conditional density operator. It naturally captures multivariate, multimodal output densities and is competitive with recent neural conditional density models and Gaussian processes. To derive the model, we propose a novel approach to the reconstruction of probability densities from their kernel mean embeddings by drawing connections to estimation of Radon-Nikodym derivatives in the reproducing kernel Hilbert space (RKHS). We prove finite sample error bounds which are independent of problem dimensionality. Furthermore, the resulting conditional density model is applied to real-world data and we demonstrate its versatility and competitive performance.

artificial intelligence, neural network, operator, (16 more...)

arXiv.org Machine Learning

1905.11255

Country:

North America > United States (0.28)
Asia (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)

Add feedback

Kernel canonical correlation analysis approximates operators for the detection of coherent structures in dynamical data

Klus, Stefan, Husic, Brooke E., Mollenhauer, Mattes

arXiv.org Machine LearningApr-16-2019

We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space (RKHS) operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that coherent sets of particle trajectories can be computed by applying kernel CCA to Lagrangian data. We demonstrate the efficiency of this approach with several examples, namely the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Furthermore, we propose a straightforward generalization of dynamic mode decomposition (DMD) called coherent mode decomposition (CMD).

artificial intelligence, machine learning, operator, (18 more...)

arXiv.org Machine Learning

1904.07752

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.86)

Add feedback

On Hyperparameter Search in Cluster Ensembles

Helfmann, Luzie, von Lindheim, Johannes, Mollenhauer, Mattes, Banisch, Ralf

arXiv.org Machine LearningMar-29-2018

Quality assessments of models in unsupervised learning and clustering verification in particular have been a long-standing problem in the machine learning research. The lack of robust and universally applicable cluster validity scores often makes the algorithm selection and hyperparameter evaluation a tough guess. In this paper, we show that cluster ensemble aggregation techniques such as consensus clustering may be used to evaluate clusterings and their hyperparameter configurations. We use normalized mutual information to compare individual objects of a clustering ensemble to the constructed consensus of the whole ensemble and show, that the resulting score can serve as an overall quality measure for clustering problems. This method is capable of highlighting the standout clustering and hyperparameter configuration in the ensemble even in the case of a distorted consensus. We apply this very general framework to various data sets and give possible directions for future research.

artificial intelligence, consensus, health & medicine, (16 more...)

arXiv.org Machine Learning

1803.11008

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback