AITopics | Majumder, Orchid

Collaborating Authors

Majumder, Orchid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Scaling of Diffusion Transformers for Text-to-Image Generation

Li, Hao, Lal, Shamit, Li, Zhiheng, Xie, Yusheng, Wang, Ying, Zou, Yang, Majumder, Orchid, Manmatha, R., Tu, Zhuowen, Ermon, Stefano, Soatto, Stefano, Swaminathan, Ashwin

arXiv.org Artificial IntelligenceDec-16-2024

Figure 1: Examples of high-resolution images generated by a 2.3B U-ViT 1K model. We empirically study the scaling properties of various Diffusion Transformers (DiTs) for text-to-image generation by performing extensive and rigorous ablations, including training scaled DiTs ranging from 0.3B upto 8B parameters on datasets up to 600M images. We find that U-ViT, a pure self-attention based DiT model provides a simpler design and scales more effectively in comparison with crossattention based DiT variants, which allows straightforward expansion for extra conditions and other modalities. We identify a 2.3B U-ViT model can get better performance than SDXL UNet and other DiT variants in controlled setting. On the data scaling side, we investigate how increasing dataset size and enhanced long caption improve the text-image alignment performance and the learning efficiency. Transformer (Vaswani et al., 2017)'s straightforward design and ability to scale efficiently has driven significant advancements in large language models (LLMs) (Kaplan et al., 2020). Its inherent simplicity and ease of parallelization makes it well-suited for hardware acceleration. Despite the rapid evolution of DiT models, a comprehensive comparison between various DiT architectures and UNet-based models for text-to-image generation (T2I) is still lacking. Furthermore, the optimal scaling strategy for transformer models in T2I tasks compared to UNet is yet to be determined. The challenge of establishing a fair comparison is further compounded by the variation in training settings and the significant computational resources required to train these models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.12391

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Scalability of Diffusion-based Text-to-Image Generation

Li, Hao, Zou, Yang, Wang, Ying, Majumder, Orchid, Xie, Yusheng, Manmatha, R., Swaminathan, Ashwin, Tu, Zhuowen, Ermon, Stefano, Soatto, Stefano

arXiv.org Artificial IntelligenceApr-3-2024

Scaling up model and data size has been quite successful for the evolution of LLMs. However, the scaling law for the diffusion based text-to-image (T2I) models is not fully explored. It is also unclear how to efficiently scale the model for better performance at reduced cost. The different training settings and expensive training cost make a fair model comparison extremely difficult. In this work, we empirically study the scaling properties of diffusion based T2I models by performing extensive and rigours ablations on scaling both denoising backbones and training set, including training scaled UNet and Transformer variants ranging from 0.4B to 4B parameters on datasets upto 600M images. For model scaling, we find the location and amount of cross attention distinguishes the performance of existing UNet designs. And increasing the transformer blocks is more parameter-efficient for improving text-image alignment than increasing channel numbers. We then identify an efficient UNet variant, which is 45% smaller and 28% faster than SDXL's UNet. On the data scaling side, we show the quality and diversity of the training set matters more than simply dataset size. Increasing caption density and diversity improves text-image alignment performance and the learning efficiency. Finally, we provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute and dataset size.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.02883

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Estimating informativeness of samples with Smooth Unique Information

Harutyunyan, Hrayr, Achille, Alessandro, Paolini, Giovanni, Majumder, Orchid, Ravichandran, Avinash, Bhotika, Rahul, Soatto, Stefano

arXiv.org Machine LearningJan-17-2021

We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights. Though related, we show that these quantities have a qualitatively different behavior. We give efficient approximations of these quantities using a linearized network and demonstrate empirically that the approximation is accurate for real-world architectures, such as pre-trained ResNets. We apply these measures to several problems, such as dataset summarization, analysis of under-sampled classes, comparison of informativeness of different data sources, and detection of adversarial and corrupted examples. Our work generalizes existing frameworks but enjoys better computational properties for heavily overparametrized models, which makes it possible to apply it to real-world networks. Training a deep neural network (DNN) entails extracting information from samples in a dataset and storing it in the weights of the network, so that it may be used in future inference or prediction. But how much information does a particular sample contribute to the trained model? The answer can be used to provide strong generalization bounds (if no information is used, the network is not memorizing the sample), privacy bounds (how much information the network can leak about a particular sample), and enable better interpretation of the training process and its outcome. To determine the information content of samples, we need to define and compute information. In the classical sense, information is a property of random variables, which may be degenerate for the deterministic process of computing the output of a trained DNN in response to a given input (inference). So, even posing the problem presents some technical challenges.

deep learning, information, neural network, (20 more...)

arXiv.org Machine Learning

2101.0664

Country:

North America > United States > Oregon (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Scheduling the Learning Rate via Hypergradients: New Insights and a New Algorithm

Donini, Michele, Franceschi, Luca, Pontil, Massimiliano, Majumder, Orchid, Frasconi, Paolo

arXiv.org Machine LearningOct-18-2019

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate, the hypergradient, and based on this we introduce a novel online algorithm. Research in this direction is vast (see Hutter et al. (2019) for an overview) and includes model-based (Snoek et al., 2012; Hutter et al., 2015), model-free (Bergstra & Bengio, 2012; Hansen, 2016), and gradient-based (Domke, 2012; Maclaurin et al., 2015) approaches. Problem (1) can be in principle solved by any HPO technique.

neural network, optimization, optimization problem, (19 more...)

arXiv.org Machine Learning

1910.08525

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback