Cavazza, Jacopo
Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning
Paoletti, Giancarlo, Cavazza, Jacopo, Beyan, Cigdem, Del Bue, Alessio
Despite the fact that subspace clustering has become a powerful Given a trimmed sequence, in which a single action or activity technique for problems such as face clustering or digit is assumed to be present, the final goal of HAR is to correctly recognition, its applicability to the problems like skeletonbased classifying it. Although significant progresses have been made HAR was only explored by a limited number of works in the last years, accurate action recognition in videos is still a [7], [8], [9]. This is due to many operative limitations including challenging task because of the complexity of the visual data how to handle the temporal dimensions, the inherent noise e.g., due to varying camera viewpoints, occlusions and abrupt present in the skeletal data and the related computational changes in lighting conditions.
Dropout as a Low-Rank Regularizer for Matrix Factorization
Cavazza, Jacopo, Morerio, Pietro, Haeffele, Benjamin, Lane, Connor, Murino, Vittorio, Vidal, Rene
Regularization for matrix factorization (MF) and approximation problems has been carried out in many different ways. Due to its popularity in deep learning, dropout has been applied also for this class of problems. Despite its solid empirical performance, the theoretical properties of dropout as a regularizer remain quite elusive for this class of problems. In this paper, we present a theoretical analysis of dropout for MF, where Bernoulli random variables are used to drop columns of the factors. We demonstrate the equivalence between dropout and a fully deterministic model for MF in which the factors are regularized by the sum of the product of squared Euclidean norms of the columns. Additionally, we inspect the case of a variable sized factorization and we prove that dropout achieves the global minimum of a convex approximation problem with (squared) nuclear norm regularization. As a result, we conclude that dropout can be used as a low-rank regularizer with data dependent singular-value thresholding.
An Analysis of Dropout for Matrix Factorization
Cavazza, Jacopo, Lane, Connor, Haeffele, Benjamin D., Murino, Vittorio, Vidal, Renรฉ
Dropout is a simple yet effective algorithm for regularizing neural networks by randomly dropping out units through Bernoulli multiplicative noise, and for some restricted problem classes, such as linear or logistic regression, several theoretical studies have demonstrated the equivalence between dropout and a fully deterministic optimization problem with data-dependent Tikhonov regularization. This work presents a theoretical analysis of dropout for matrix factorization, where Bernoulli random variables are used to drop a factor, thereby attempting to control the size of the factorization. While recent work has demonstrated the empirical effectiveness of dropout for matrix factorization, a theoretical understanding of the regularization properties of dropout in this context remains elusive. This work demonstrates the equivalence between dropout and a fully deterministic model for matrix factorization in which the factors are regularized by the sum of the product of the norms of the columns. While the resulting regularizer is closely related to a variational form of the nuclear norm, suggesting that dropout may limit the size of the factorization, we show that it is possible to trivially lower the objective value by doubling the size of the factorization. We show that this problem is caused by the use of a fixed dropout rate, which motivates the use of a rate that increases with the size of the factorization. Synthetic experiments validate our theoretical findings.
Curriculum Dropout
Morerio, Pietro, Cavazza, Jacopo, Volpi, Riccardo, Vidal, Rene, Murino, Vittorio
Dropout is a very effective way of regularizing neural networks. Stochastically "dropping out" units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network generalization. Besides, Dropout can be interpreted as an approximate model aggregation technique, where an exponential number of smaller networks are averaged in order to get a more powerful ensemble. In this paper, we show that using a fixed dropout probability during training is a suboptimal choice. We thus propose a time scheduling for the probability of retaining neurons in the network. This induces an adaptive regularization scheme that smoothly increases the difficulty of the optimization problem. This idea of "starting easy" and adaptively increasing the difficulty of the learning problem has its roots in curriculum learning and allows one to train better models. Indeed, we prove that our optimization strategy implements a very general curriculum scheme, by gradually adding noise to both the input and intermediate feature representations within the network architecture. Experiments on seven image classification datasets and different network architectures show that our method, named Curriculum Dropout, frequently yields to better generalization and, at worst, performs just as well as the standard Dropout method.