AITopics

2502.09151

Genre: Research Report > New Finding (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningMay-26-2024

How many samples are needed to train a deep neural network?

Golestaneh, Pegah, Taheri, Mahsa, Lederer, Johannes

Neural networks have become standard tools in many areas, yet many important statistical questions remain open. This paper studies the question of how much data are needed to train a ReLU feed-forward neural network. Our theoretical and empirical results suggest that the generalization error of ReLU feed-forward neural networks scales at the rate 1/ n in the sample size n rather than the usual "parametric rate" 1/n. Thus, broadly speaking, our results underpin the common belief that neural networks need "many" training samples. Neural networks have ubiquitous applications in science and business (Goodfellow et al., 2016; Graves et al., 2013; LeCun et al., 2015; Badrinarayanan et al., 2017). However, our understanding of their statistical properties remains incomplete. For example, a basic yet very important open question is: how many training samples are needed to train a (non-linear) neural network?

artificial intelligence, machine learning, neural network, (16 more...)

2405.16696

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJan-24-2024

Benchmarking the Fairness of Image Upsampling Methods

Laszkiewicz, Mike, Daunhawer, Imant, Vogt, Julia E., Fischer, Asja, Lederer, Johannes

Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics$\unicode{x2013}$inspired by their supervised fairness counterparts$\unicode{x2013}$to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results.

artificial intelligence, fairface, machine learning, (13 more...)

2401.13555

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

arXiv.org Artificial IntelligenceNov-14-2023

Single-Model Attribution of Generative Models Through Final-Layer Inversion

Laszkiewicz, Mike, Ricker, Jonas, Lederer, Johannes, Fischer, Asja

Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by, first, viewing single-model attribution through the lens of anomaly detection. Arising from this change of perspective, we propose FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach and its flexibility to various domains.

large language model, machine learning, natural language, (19 more...)

2306.0621

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Law > Intellectual Property & Technology Law (0.54)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
(2 more...)

arXiv.org Machine LearningNov-13-2023

Affine Invariance in Continuous-Domain Convolutional Neural Networks

Mohaddes, Ali, Lederer, Johannes

The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considering isometric invariance or similarity invariance, we focus on the full structure of affine transforms generated by the generalized linear group $\mathrm{GL}_2(\mathbb{R})$. We introduce a new criterion to assess the similarity of two input signals under affine transformations. Then, unlike conventional methods that involve solving complex optimization problems on the Lie group $G_2$, we analyze the convolution of lifted signals and compute the corresponding integration over $G_2$. In sum, our research could eventually extend the scope of geometrical transformations that practical deep-learning pipelines can handle.

artificial intelligence, deep learning, machine learning, (20 more...)

2311.09245

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-22-2023

Set-Membership Inference Attacks using Data Watermarking

Laszkiewicz, Mike, Lukovnikov, Denis, Lederer, Johannes, Fischer, Asja

In this work, we propose a set-membership inference attack for generative models using deep image watermarking techniques. In particular, we demonstrate how conditional sampling from a generative model can reveal the watermark that was injected into parts of the training data. Our empirical results demonstrate that the proposed watermarking technique is a principled approach for detecting the non-consensual use of image data in training generative models.

artificial intelligence, generative model, machine learning, (11 more...)

2307.15067

Genre: Research Report (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Sensing and Signal Processing > Image Processing (0.87)

arXiv.org Artificial IntelligenceMar-3-2023

Lag selection and estimation of stable parameters for multiple autoregressive processes through convex programming

Chakraborty, Somnath, Lederer, Johannes, von Sachs, Rainer

Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different processes to simultaneously select the lag and estimate the parameters. We prove that the estimated process is stable, and we establish rates for the forecasting error that can outmatch the known rate in our setting. Our insights on the lag selection and the stability are also of interest for the case of individual autoregressive processes.

artificial intelligence, autoregressive process, machine learning, (16 more...)

2303.02114

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceFeb-22-2023

The DeepCAR Method: Forecasting Time-Series Data That Have Change Points

Jungbluth, Ayla, Lederer, Johannes

Many methods for time-series forecasting are known in classical statistics, such as autoregression, moving averages, and exponential smoothing. The DeepAR framework is a novel, recent approach for time-series forecasting based on deep learning. DeepAR has shown very promising results already. However, time series often have change points, which can degrade the DeepAR's prediction performance substantially. This paper extends the DeepAR framework by detecting and including those change points. We show that our method performs as well as standard DeepAR when there are no change points and considerably better when there are change points. More generally, we show that the batch size provides an effective and surprisingly simple way to deal with change points in DeepAR, Transformers, and other modern forecasting models.

artificial intelligence, change point, machine learning, (18 more...)

2302.11241

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceDec-11-2022

Statistical guarantees for sparse deep learning

Lederer, Johannes

Neural networks are becoming increasingly popular in applications, but our mathematical understanding of their potential and limitations is still limited. In this paper, we further this understanding by developing statistical guarantees for sparse deep learning. In contrast to previous work, we consider different types of sparsity, such as few active connections, few active nodes, and other norm-based types of sparsity. Moreover, our theories cover important aspects that previous theories have neglected, such as multiple outputs, regularization, and l2-loss. The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective. Some of the concepts and tools that we use in our derivations are uncommon in deep learning and, hence, might be of additional interest.

artificial intelligence, estimator, machine learning, (15 more...)

2212.05427

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-27-2022

Marginal Tail-Adaptive Normalizing Flows

Laszkiewicz, Mike, Lederer, Johannes, Fischer, Asja

Learning the tail behavior of a distribution is a notoriously difficult problem. By definition, the number of samples from the tail is small, and deep generative models, such as normalizing flows, tend to concentrate on learning the body of the distribution. In this paper, we focus on improving the ability of normalizing flows to correctly capture the tail behavior and, thus, form more accurate models. We prove that the marginal tailedness of an autoregressive flow can be controlled via the tailedness of the marginals of its base distribution. This theoretical insight leads us to a novel type of flows based on flexible base distributions and data-driven linear layers. An empirical analysis shows that the proposed method improves on the accuracy -- especially on the tails of the distribution -- and is able to generate heavy-tailed data. We demonstrate its application on a weather and climate example, in which capturing the tail behavior is essential.

artificial intelligence, base distribution, machine learning, (17 more...)

2206.10311

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)