AITopics | Pérez, Juan C.

Collaborating Authors

Pérez, Juan C.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Liu, Haozhe, Liu, Shikun, Zhou, Zijian, Xu, Mengmeng, Xie, Yanping, Han, Xiao, Pérez, Juan C., Liu, Ding, Kahatapitiya, Kumara, Jia, Menglin, Wu, Jui-Chieh, He, Sen, Xiang, Tao, Schmidhuber, Jürgen, Pérez-Rúa, Juan-Manuel

arXiv.org Artificial IntelligenceOct-26-2024

We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini's MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.2028

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

Castillo, Angela, Kohler, Jonas, Pérez, Juan C., Pérez, Juan Pablo, Pumarola, Albert, Ghanem, Bernard, Arbeláez, Pablo, Thabet, Ali

arXiv.org Artificial IntelligenceDec-19-2023

This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search framework. Our findings suggest that the denoising steps proposed by CFG become increasingly aligned with simple conditional steps, which renders the extra neural network evaluation of CFG redundant, especially in the second half of the denoising process. Building upon this insight, we propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively omits network evaluations when the denoising process displays convergence. Our experiments demonstrate that AG preserves CFG's image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter while being training-free and retaining the capacity to handle negative prompts. Finally, we uncover further redundancies of CFG in the first half of the diffusion process, showing that entire neural function evaluations can be replaced by simple affine transformations of past score estimates. This method, termed LinearAG, offers even cheaper inference at the cost of deviating from the baseline model. Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.12487

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Revisiting Test Time Adaptation under Online Evaluation

Alfarra, Motasem, Itani, Hani, Pardo, Alejandro, Alhuwaider, Shyma, Ramazanova, Merey, Pérez, Juan C., Cai, Zhipeng, Müller, Matthias, Ghanem, Bernard

arXiv.org Artificial IntelligenceApr-10-2023

This paper proposes a novel online evaluation protocol for Test Time Adaptation (TTA) methods, which penalizes slower methods by providing them with fewer samples for adaptation. TTA methods leverage unlabeled data at test time to adapt to distribution shifts. Though many effective methods have been proposed, their impressive performance usually comes at the cost of significantly increased computation budgets. Current evaluation protocols overlook the effect of this extra computation cost, affecting their real-world applicability. To address this issue, we propose a more realistic evaluation protocol for TTA methods, where data is received in an online fashion from a constant-speed data stream, thereby accounting for the method's adaptation speed. We apply our proposed protocol to benchmark several TTA methods on multiple datasets and scenarios. Extensive experiments shows that, when accounting for inference speed, simple and fast approaches can outperform more sophisticated but slower methods. For example, SHOT from 2020 outperforms the state-of-the-art method SAR from 2023 under our online setting. Our online evaluation protocol emphasizes the need for developing TTA methods that are efficient and applicable in realistic settings.

artificial intelligence, machine learning, tta method, (18 more...)

arXiv.org Artificial Intelligence

2304.04795

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

On the Robustness of Quality Measures for GANs

Alfarra, Motasem, Pérez, Juan C., Frühstück, Anna, Torr, Philip H. S., Wonka, Peter, Ghanem, Bernard

arXiv.org Artificial IntelligenceJul-20-2022

This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr\'echet Inception Distance (FID). Analogous to the vulnerability of deep models against a variety of adversarial attacks, we show that such metrics can also be manipulated by additive pixel perturbations. Our experiments indicate that one can generate a distribution of images with very high scores but low perceptual quality. Conversely, one can optimize for small imperceptible perturbations that, when added to real world images, deteriorate their scores. We further extend our evaluation to generative models themselves, including the state of the art network StyleGANv2. We show the vulnerability of both the generative model and the FID against additive perturbations in the latent space. Finally, we show that the FID can be robustified by simply replacing the standard Inception with a robust Inception. We validate the effectiveness of the robustified metric through extensive experiments, showing it is more robust against manipulation.

artificial intelligence, machine learning, perturbation, (16 more...)

arXiv.org Artificial Intelligence

2201.13019

Country:

North America > Canada (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.50)
Government (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ClusTR: Clustering Training for Robustness

Alfarra, Motasem, Pérez, Juan C., Bibi, Adel, Thabet, Ali, Arbeláez, Pablo, Ghanem, Bernard

arXiv.org Machine LearningJun-24-2020

This paper studies how encouraging semantically-aligned features during deep neural network training can increase network robustness. Recent works observed that Adversarial Training leads to robust models, whose learnt features appear to correlate with human perception. Inspired by this connection from robustness to semantics, we study the complementary connection: from semantics to robustness. To do so, we provide a tight robustness certificate for distance-based classification models (clustering-based classifiers), which we leverage to propose ClusTR (Clustering Training for Robustness), a clustering-based and adversary-free training framework to learn robust models. Interestingly, ClusTR outperforms adversarially-trained networks by up to 4\% under strong PGD attacks. Moreover, it can be equipped with simple and fast adversarial training to improve the current state-of-the-art in robustness by 16\%-29\% on CIFAR10, SVHN, and CIFAR100.

deep learning, neural network, robustness, (18 more...)

arXiv.org Machine Learning

2006.07682

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback