AITopics | Mallya, Arun

Collaborating Authors

Mallya, Arun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

NVIDIA, null, :, null, Atzmon, Yuval, Bala, Maciej, Balaji, Yogesh, Cai, Tiffany, Cui, Yin, Fan, Jiaojiao, Ge, Yunhao, Gururani, Siddharth, Huffman, Jacob, Isaac, Ronald, Jannaty, Pooya, Karras, Tero, Lam, Grace, Lewis, J. P., Licata, Aaron, Lin, Yen-Chen, Liu, Ming-Yu, Ma, Qianli, Mallya, Arun, Martino-Tarr, Ashlee, Mendez, Doug, Nah, Seungjun, Pruett, Chris, Reda, Fitsum, Song, Jiaming, Wang, Ting-Chun, Wei, Fangyin, Zeng, Xiaohui, Zeng, Yu, Zhang, Qinsheng

arXiv.org Artificial IntelligenceNov-11-2024

We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for image customization.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.07126

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Movie Gen: A Cast of Media Foundation Models

Polyak, Adam, Zohar, Amit, Brown, Andrew, Tjandra, Andros, Sinha, Animesh, Lee, Ann, Vyas, Apoorv, Shi, Bowen, Ma, Chih-Yao, Chuang, Ching-Yao, Yan, David, Choudhary, Dhruv, Wang, Dingkang, Sethi, Geet, Pang, Guan, Ma, Haoyu, Misra, Ishan, Hou, Ji, Wang, Jialiang, Jagadeesh, Kiran, Li, Kunpeng, Zhang, Luxin, Singh, Mannat, Williamson, Mary, Le, Matt, Yu, Matthew, Singh, Mitesh Kumar, Zhang, Peizhao, Vajda, Peter, Duval, Quentin, Girdhar, Rohit, Sumbaly, Roshan, Rambhatla, Sai Saketh, Tsai, Sam, Azadi, Samaneh, Datta, Samyak, Chen, Sanyuan, Bell, Sean, Ramaswamy, Sharadh, Sheynin, Shelly, Bhattacharya, Siddharth, Motwani, Simran, Xu, Tao, Li, Tianhe, Hou, Tingbo, Hsu, Wei-Ning, Yin, Xi, Dai, Xiaoliang, Taigman, Yaniv, Luo, Yaqiao, Liu, Yen-Cheng, Wu, Yi-Chiao, Zhao, Yue, Kirstain, Yuval, He, Zecheng, He, Zijian, Pumarola, Albert, Thabet, Ali, Sanakoyeu, Artsiom, Mallya, Arun, Guo, Baishan, Araya, Boris, Kerr, Breena, Wood, Carleigh, Liu, Ce, Peng, Cen, Vengertsev, Dimitry, Schonfeld, Edgar, Blanchard, Elliot, Juefei-Xu, Felix, Nord, Fraylie, Liang, Jeff, Hoffman, John, Kohler, Jonas, Fire, Kaolin, Sivakumar, Karthik, Chen, Lawrence, Yu, Licheng, Gao, Luya, Georgopoulos, Markos, Moritz, Rashel, Sampson, Sara K., Li, Shikai, Parmeggiani, Simone, Fine, Steve, Fowler, Tara, Petrovic, Vladan, Du, Yuming

arXiv.org Artificial IntelligenceOct-17-2024

We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models. All videos from this paper are available at https://go.fb.me/MovieGenResearchVideos.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2410.1372

Country: Asia (0.45)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Media > Music (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion

Yin, Hongxu, Molchanov, Pavlo, Li, Zhizhong, Alvarez, Jose M., Mallya, Arun, Hoiem, Derek, Jha, Niraj K., Kautz, Jan

arXiv.org Machine LearningDec-18-2019

We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network. We 'invert' a trained network (teacher) to synthesize class-conditional input images starting from random noise, without using any additional information about the training dataset. Keeping the teacher fixed, our method optimizes the input while regularizing the distribution of intermediate feature maps using information stored in the batch normalization layers of the teacher. Further, we improve the diversity of synthesized images using Adaptive DeepInversion, which maximizes the Jensen-Shannon divergence between the teacher and student network logits. The resulting synthesized images from networks trained on the CIFAR-10 and ImageNet datasets demonstrate high fidelity and degree of realism, and help enable a new breed of data-free applications - ones that do not require any real images or labeled data. We demonstrate the applicability of our proposed method to three tasks of immense practical importance -- (i) data-free network pruning, (ii) data-free knowledge transfer, and (iii) data-free continual learning.

deep learning, deepinversion, neural network, (18 more...)

arXiv.org Machine Learning

1912.08795

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Importance Estimation for Neural Network Pruning

Molchanov, Pavlo, Mallya, Arun, Tyree, Stephen, Frosio, Iuri, Kautz, Jan

arXiv.org Machine LearningJun-25-2019

Structural pruning of neural network parameters reduces computation, energy, and memory transfer costs during inference. We propose a novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores. We describe two variations of our method using the first and second-order Taylor expansions to approximate a filter's contribution. Both methods scale consistently across any network layer without requiring per-layer sensitivity analysis and can be applied to any kind of layer, including skip connections. For modern networks trained on ImageNet, we measured experimentally a high (>93%) correlation between the contribution computed by our methods and a reliable estimate of the true importance. Pruning with the proposed methods leads to an improvement over state-of-the-art in terms of accuracy, FLOPs, and parameter reduction. On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0.02% in the top-1 accuracy on ImageNet. Code is available at https://github.com/NVlabs/Taylor_pruning.

deep learning, neural network, pruning, (20 more...)

arXiv.org Machine Learning

1906.10771

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Few-Shot Unsupervised Image-to-Image Translation

Liu, Ming-Yu, Huang, Xun, Mallya, Arun, Karras, Tero, Aila, Timo, Lehtinen, Jaakko, Kautz, Jan

arXiv.org Machine LearningMay-5-2019

Image-to-image Translation (FUNIT) framework, aiming at learning an image-to-image translation Unsupervised image-to-image translation methods learn model for mapping an image of a source class to an analogous to map images in a given class to an analogous image in image of a target class by leveraging few images of a different class, drawing on unstructured (non-registered) the target class given at test time. The model is never shown datasets of images. While remarkably successful, current images of the target class during training but is asked to methods require access to many images in both source and generate some of them at test time. To proceed, we first hypothesize destination classes at training time. We argue this greatly that the few-shot generation capability of humans limits their use. Drawing inspiration from the human capability develops from their past visual experiences--a person can of picking up the essence of a novel object from better imagine views of a new object if the person has seen a small number of examples and generalizing from there, many more different object classes in the past. Based on we seek a few-shot, unsupervised image-to-image translation the hypothesis, we train our FUNIT model using a dataset algorithm that works on previously unseen target containing images of many different object classes for simulating classes that are specified, at test time, only by a few example the past visual experiences. Specifically, we train the images. Our model achieves this few-shot generation model to translate images from one class to another class capability by coupling an adversarial training scheme by leveraging few example images of the another class.

artificial intelligence, object-oriented architecture, source class, (19 more...)

arXiv.org Machine Learning

1905.01723

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback