AITopics | Abati, Davide

Collaborating Authors

Abati, Davide

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Object-Centric Diffusion for Efficient Video Editing

Kahatapitiya, Kumara, Karjauv, Adil, Abati, Davide, Porikli, Fatih, Asano, Yuki M., Habibian, Amirhossein

arXiv.org Artificial IntelligenceJan-11-2024

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we conduct an analysis of such inefficiencies, and suggest simple yet effective modifications that allow significant speed-ups whilst maintaining quality. Moreover, we introduce Object-Centric Diffusion, coined as OCD, to further reduce latency by allocating computations more towards foreground edited regions that are arguably more important for perceptual quality. We achieve this by two novel proposals: i) Object-Centric Sampling, decoupling the diffusion steps spent on salient regions or background, allocating most of the model capacity to the former, and ii) Object-Centric 3D Token Merging, which reduces cost of cross-frame attention by fusing redundant tokens in unimportant background regions. Both techniques are readily applicable to a given video editing model \textit{without} retraining, and can drastically reduce its memory and computational cost. We evaluate our proposals on inversion-based and control-signal-based editing pipelines, and show a latency reduction up to 10x for a comparable synthesis quality.

latency, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.05735

Country: Atlantic Ocean (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.47)
Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Region-of-Interest Based Neural Video Compression

Perugachi-Diaz, Yura, Sautière, Guillaume, Abati, Davide, Yang, Yang, Habibian, Amirhossein, Cohen, Taco S

arXiv.org Artificial IntelligenceNov-2-2022

Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low rate constraints. Recently, several neural codecs have been introduced for video compression, yet they operate uniformly over all spatial locations, lacking the capability of ROI-based processing. In this paper, we introduce two models for ROI-based neural video coding. First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background. Secondly, we design an explicit latent scaling method, that allows control over the quantization binwidth for different spatial regions of latent variables, conditioned on the ROI mask. By extensive experiments, we show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover, they can generalize to different datasets and to any arbitrary ROI at inference time. Finally, they do not require expensive pixel-level annotations during training, as synthetic ROI masks can be used with little to no degradation in performance. To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.

artificial intelligence, machine learning, roi ssf, (15 more...)

arXiv.org Artificial Intelligence

2203.01978

Country: Europe (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Dark Experience for General Continual Learning: a Strong, Simple Baseline

Buzzega, Pietro, Boschini, Matteo, Porrello, Angelo, Abati, Davide, Calderara, Simone

arXiv.org Machine LearningOct-22-2020

Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority of them overlooks the properties of a practical scenario, where the data stream cannot be shaped as a sequence of tasks and offline training is not viable. We work towards General Continual Learning (GCL), where task boundaries blur and the domain and class distributions shift either gradually or suddenly. We address it through mixing rehearsal with knowledge distillation and regularization; our simple baseline, Dark Experience Replay, matches the network's logits sampled throughout the optimization trajectory, thus promoting consistency with its past. By conducting an extensive analysis on both standard benchmarks and a novel GCL evaluation setting (MNIST-360), we show that such a seemingly simple baseline outperforms consolidated approaches and leverages limited resources. We further explore the generalization capabilities of our objective, showing its regularization being beneficial beyond mere performance.

artificial intelligence, der, neural network, (18 more...)

arXiv.org Machine Learning

2004.07211

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Italy (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Classifying Signals on Irregular Domains via Convolutional Cluster Pooling

Porrello, Angelo, Abati, Davide, Calderara, Simone, Cucchiara, Rita

arXiv.org Machine LearningFeb-13-2019

We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. To this end, we introduce a Convolutional Cluster Pooling layer exploiting a multi-scale clustering in order to highlight, at different resolutions, locally connected regions on the input graph. Our proposal generalises well-established neural models such as Convolutional Neural Networks (CNNs) on irregular and complex domains, by means of the exploitation of the weight sharing property in a graph-oriented architecture. In this work, such property is based on the centrality of each vertex within its soft-assigned cluster. Extensive experiments on NTU RGB+D, CIFAR-10 and 20NEWS demonstrate the effectiveness of the proposed technique in capturing both local and global patterns in graph-structured data out of different domains.

deep learning, graph, neural network, (18 more...)

arXiv.org Machine Learning

1902.0485

Country: North America > United States > Louisiana (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback