AITopics | imagenet32

Collaborating Authors

imagenet32

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Breaking the Likelihood-Quality Trade-off in Diffusion Models by Merging Pretrained Experts

Esfandiari, Yasin, Bauer, Stefan, Stich, Sebastian U., Dittadi, Andrea

arXiv.org Machine LearningNov-25-2025

Diffusion models for image generation often exhibit a trade-off between perceptual sample quality and data likelihood: training objectives emphasizing high-noise denoising steps yield realistic images but poor likelihoods, whereas likelihood-oriented training overweights low-noise steps and harms visual fidelity. We introduce a simple plug-and-play sampling method that combines two pre-trained diffusion experts by switching between them along the denoising trajectory. Specifically, we apply an image-quality expert at high noise levels to shape global structure, then switch to a likelihood expert at low noise levels to refine pixel statistics. The approach requires no retraining or fine-tuning--only the choice of an intermediate switching step. On CIFAR-10 and ImageNet32, the merged model consistently matches or outperforms its base components, improving or preserving both likelihood and sample quality relative to each expert alone. These results demonstrate that expert switching across noise levels is an effective way to break the likelihood-quality trade-off in image diffusion models. Diffusion models are a class of probabilistic generative models that learn to approximate a data distribution by reversing a forward noising process through a learned denoising procedure (Sohl-Dickstein et al., 2015; Ho et al., 2020; Nichol & Dhariwal, 2021).

arxiv preprint arxiv, diffusion model, likelihood, (11 more...)

arXiv.org Machine Learning

2511.19434

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Saarland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

A Derivations

Neural Information Processing SystemsAug-15-2025, 23:06:41 GMT

DenseNets and a bound of the Lipschitz for the activation functions. A.1 Derivation of Lipschitz constant K for the concatenation We know that a function f is K -Lipschitz if for all points v and w the following holds: d A.2 Derivation bounded Lipschitz Concatenated ReLU We define function: R! We have four different situations that can happen. For CIFAR10, the full i-DenseNets utilize 24.9M to utilize the 25.2M of Residual Flows. For ImageNet32, i-DenseNet utilizes 47.0M parameters to utilize the 47.1M of the Residual Flow.

cifar10, equation, residual flow, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback

OOD Detection with immature Models

Montazeran, Behrooz, Köthe, Ullrich

arXiv.org Artificial IntelligenceFeb-2-2025

Likelihood-based deep generative models (DGMs) have gained significant attention for their ability to approximate the distributions of high-dimensional data. However, these models lack a performance guarantee in assigning higher likelihood values to in-distribution (ID) inputs, data the models are trained on, compared to out-of-distribution (OOD) inputs. This counter-intuitive behaviour is particularly pronounced when ID inputs are more complex than OOD data points. One potential approach to address this challenge involves leveraging the gradient of a data point with respect to the parameters of the DGMs. A recent OOD detection framework proposed estimating the joint density of layer-wise gradient norms for a given data point as a model-agnostic method, demonstrating superior performance compared to the Typicality Test across likelihood-based DGMs and image dataset pairs. In particular, most existing methods presuppose access to fully converged models, the training of which is both time-intensive and computationally demanding. In this work, we demonstrate that using immature models,stopped at early stages of training, can mostly achieve equivalent or even superior results on this downstream task compared to mature models capable of generating high-quality samples that closely resemble ID data. This novel finding enhances our understanding of how DGMs learn the distribution of ID data and highlights the potential of leveraging partially trained models for downstream tasks. Furthermore, we offer a possible explanation for this unexpected behaviour through the concept of support overlap.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.0082

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe (0.14)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

On How Iterative Magnitude Pruning Discovers Local Receptive Fields in Fully Connected Neural Networks

Redman, William T., Wang, Zhangyang, Ingrosso, Alessandro, Goldt, Sebastian

arXiv.org Artificial IntelligenceDec-9-2024

Iterative magnitude pruning (IMP) [1] has emerged as a powerful tool for identifying sparse subnetworks ("winning tickets") that can be trained to perform as well as the dense model they are extracted from [2, 3]. That IMP, despite its simplicity, is more robust in discovering such winning tickets than other, more complex pruning schemes [4] suggests that its iterative coarse-graining [5] is especially capable of extracting and maintaining strong inductive biases. This perspective is strengthened by observations that winning tickets discovered by IMP: 1) have properties that make them transferable across related tasks [6-13] and architectures [14]; 2) can outperform dense models on classes with limited data [15]; 3) have less overconfident predictions [16]. The first direct evidence for IMP discovering good inductive biases came from studying the winning tickets extracted by IMP in fully connected neural networks (FCNs) [17]. Pellegrini and Biroli (2022) [17] found that the sparse subnetworks identified by IMP had local receptive field (RF) structure (Figure 1A), an architectural feature found in visual cortex [18] and convolutional neural networks (CNNs) [19]. Comparing IMP derived winning tickets with the sparse subnetworks found by oneshot pruning (Figure 1B), Pellegrini and Biroli (2022) [17] argued that the iterative nature of IMP was essential for refining the local RF structure. However, to-date, an understanding of how IMP, a pruning method based purely on the magnitude of the network parameters, is able to "sift out" non-localized weights remains unknown. Resolving this will not only shed light on the effect of IMP on FCNs, but also will provide new insight on the success of IMP more broadly.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.06545

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre:

Contests & Prizes (1.00)
Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Self-Guided Diffusion Models

Hu, Vincent Tao, Zhang, David W, Asano, Yuki M., Burghouts, Gertjan J., Snoek, Cees G. M.

arXiv.org Artificial IntelligenceNov-27-2023

Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is thus dependent on their availability, correctness and unbiasedness. In this paper, we eliminate the need for such annotation by instead leveraging the flexibility of self-supervision signals to design a framework for self-guided diffusion models. By leveraging a feature extraction function and a self-annotation function, our method provides guidance signals at various image granularities: from the level of holistic images to object boxes and even segmentation masks. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance and may even surpass guidance based on ground-truth labels, especially on unbalanced data. When equipped with self-supervised box or mask proposals, our method further generates visually diverse yet semantically consistent images, without the need for any class, box, or segment label annotation. Self-guided diffusion is simple, flexible and expected to profit from deployment at scale. Source code will be at: https://taohu.me/sgdm/

dataset, diffusion model, guidance, (17 more...)

arXiv.org Artificial Intelligence

2210.06462

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning

Shi, Zhenmei, Chen, Jiefeng, Li, Kunyang, Raghuram, Jayaram, Wu, Xi, Liang, Yingyu, Jha, Somesh

arXiv.org Artificial IntelligenceFeb-28-2023

Pre-training representations (a.k.a. foundation models) has recently become a prevalent learning paradigm, where one first pre-trains a representation using large-scale unlabeled data, and then learns simple predictors on top of the representation using small labeled data from the downstream tasks. There are two key desiderata for the representation: label efficiency (the ability to learn an accurate classifier on top of the representation with a small amount of labeled data) and universality (usefulness across a wide range of downstream tasks). In this paper, we focus on one of the most popular instantiations of this paradigm: contrastive learning with linear probing, i.e., learning a linear predictor on the representation pre-trained by contrastive learning. We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously. Specifically, we provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks (improving universality), it puts less emphasis on task-specific features, giving rise to larger sample complexity for down-stream supervised tasks, and thus worse prediction performance. Guided by this analysis, we propose a contrastive regularization method to improve the trade-off. We validate our analysis and method empirically with systematic experiments using real-world datasets and foundation models.

artificial intelligence, machine learning, test accuracy, (14 more...)

arXiv.org Artificial Intelligence

2303.00106

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Government (0.45)
Leisure & Entertainment (0.45)
Energy > Oil & Gas (0.34)
Education > Educational Setting > Online (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Transfer Learning with Kernel Methods

Radhakrishnan, Adityanarayanan, Luyten, Max Ruiz, Prasad, Neha, Uhler, Caroline

arXiv.org Artificial IntelligenceOct-31-2022

Transfer learning refers to the machine learning problem of utilizing knowledge from a source task to improve performance on a target task. Recent approaches to transfer learning have achieved tremendous empirical success in many applications including in computer vision [17, 45], natural language processing [16, 40, 43], and the biomedical field [15, 19]. Since transfer learning approaches generally rely on complex deep neural networks, it can be difficult to characterize when and why they work [44]. Kernel methods [46] are conceptually and computationally simple machine learning models that have been found to be competitive with neural networks on a variety of tasks including image classification [3, 29, 42] and drug screening [42]. Their simplicity stems from the fact that training a kernel method involves performing linear regression after transforming the data. There has been renewed interest in kernels due to a recently established equivalence between wide neural networks and kernel methods [2, 25], which has led to the development of modern, neural tangent kernels (NTKs) that are competitive with neural networks. Given their simplicity and effectiveness, kernel methods could provide a powerful approach for transfer learning and also help characterize when transfer learning between a source and target task would be beneficial. However, developing an algorithm for transfer learning with kernel methods for general source and target tasks has been an open problem. In particular, while there is a standard transfer learning approach for neural networks that involves replacing and re-training the last layer of a pre-trained network, there is no known corresponding operation for kernels.

artificial intelligence, machine learning, predictor, (17 more...)

arXiv.org Artificial Intelligence

2211.00227

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Hybrid Generative-Contrastive Representation Learning

Kim, Saehoon, Kim, Sungwoong, Lee, Juho

arXiv.org Artificial IntelligenceJun-11-2021

Unsupervised representation learning has recently received lots of interest due to its powerful generalizability through effectively leveraging large-scale unlabeled data. There are two prevalent approaches for this, contrastive learning and generative pre-training, where the former learns representations from instance-wise discrimination tasks and the latter learns them from estimating the likelihood. These seemingly orthogonal approaches have their own strengths and weaknesses. Contrastive learning tends to extract semantic information and discards details irrelevant for classifying objects, making the representations effective for discriminative tasks while degrading robustness to out-of-distribution data. On the other hand, the generative pre-training directly estimates the data distribution, so the representations tend to be robust but not optimal for discriminative tasks. In this paper, we show that we could achieve the best of both worlds by a hybrid training scheme. Specifically, we demonstrated that a transformer-based encoder-decoder architecture trained with both contrastive and generative losses can learn highly discriminative and robust representations without hurting the generative performance. We extensively validate our approach on various tasks.

gcrl, learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2106.06162

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Invertible DenseNets with Concatenated LipSwish

Perugachi-Diaz, Yura, Tomczak, Jakub M., Bhulai, Sandjai

arXiv.org Machine LearningFeb-4-2021

We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient alternative to Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce invertibility of the network by satisfying the Lipschitz constant. We extend this method by proposing a learnable concatenation, which not only improves the model performance but also indicates the importance of the concatenated representation. Additionally, we introduce the Concatenated LipSwish as activation function, for which we show how to enforce the Lipschitz condition and which boosts performance. The new architecture, i-DenseNet, out-performs Residual Flow and other flow-based models on density estimation evaluated in bits per dimension, where we utilize an equal parameter budget. Moreover, we show that the proposed model out-performs Residual Flows when trained as a hybrid model where the model is both a generative and a discriminative model.

architecture, lipswish, residual flow, (14 more...)

arXiv.org Machine Learning

2102.02694

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Super-resolution Variational Auto-Encoders

Gatopoulos, Ioannis, Stol, Maarten, Tomczak, Jakub M.

arXiv.org Machine LearningJun-30-2020

The framework of variational autoencoders (VAEs) provides a principled method for jointly learning latent-variable models and corresponding inference models. However, the main drawback of this approach is the blurriness of the generated images. Some studies link this effect to the objective function, namely, the (negative) log-likelihood. Here, we propose to enhance VAEs by adding a random variable that is a downscaled version of the original image and still use the log-likelihood function as the learning objective. Further, by providing the downscaled image as an input to the decoder, it can be used in a manner similar to the super-resolution. We present empirically that the proposed approach performs comparably to VAEs in terms of the negative log-likelihood, but it obtains a better FID score in data synthesis.

artificial intelligence, machine learning, vae, (18 more...)

arXiv.org Machine Learning

2006.05218

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback