Advancing Video Anomaly Detection: A Concise Review and a New Dataset Arjun Raj
Video Anomaly Detection (VAD) finds widespread applications in security surveillance, traffic monitoring, industrial monitoring, and healthcare. Despite extensive research efforts, there remains a lack of concise reviews that provide insightful guidance for researchers. Such reviews would serve as quick references to grasp current challenges, research trends, and future directions.
A Appendix A.1 Shower shape variables
We extend the list of shower shape variables described in Sec. Deepest layer in the shower with non-zero energy deposit. Figure 1 shows the average events for different variations of SUPA datasets and Figs. Figure 1 - 12 shows PointFlow PointFlow [Yang et al., 2019] is a flow based model with a PointNet-like encoder and a The overall architecture has 2.1M parameters. We train all the models with 100K training examples. Figs. 13 - 18 show the histograms of various shower shape variables for SUPAv1 and samples Figs.
Coordinating Distributed Example Orders for Provably Accelerated Training
Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.
Unity by Diversity: Improved Representation Learning for Multimodal VAEs
Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or both across modalities to learn a shared representation. Such architectures impose hard constraints on the model. In this work, we show that a better latent representation can be obtained by replacing these hard constraints with a soft constraint. We propose a new mixture-of-experts prior, softly guiding each modality's latent representation towards a shared aggregate posterior. This approach results in a superior latent representation and allows each encoding to preserve information better from its uncompressed original features. In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.
CoFie: Learning Compact Neural Surface Representations with Coordinate Fields
This paper introduces CoFie, a novel local geometry-aware neural surface representation. CoFie is motivated by the theoretical analysis of local SDFs with quadratic approximation. We find that local shapes are highly compressive in an aligned coordinate frame defined by the normal and tangent directions of local shapes. Accordingly, we introduce Coordinate Field, which is a composition of coordinate frames of all local shapes. The Coordinate Field is optimizable and is used to transform the local shapes from the world coordinate frame to the aligned shape coordinate frame. It largely reduces the complexity of local shapes and benefits the learning of MLP-based implicit representations.
A Constrained sampling via post-processed denoiser
In this section, we provide more details on the apparatus necessary to perform a posteriori conditional sampling in the presence of a linear constraint. Eq. (6) suggests that the SDE drift corresponding to the score may be broken down into 3 steps: 1. However, in practice this modification creates a "discontinuity" between the constrained and unconstrained components, leading to erroneous correlations between them in the generated samples. This is the post-processed denoiser function Eq. (7) in the main text. Thus it needs to be tuned empirically. The correction in Eq. (16) is equivalent to imposing a Gaussian likelihood on x It is worth noting that both the mean and variance here have direct correspondence to estimations of statistical moments using Tweedie's formulas [29]: E[x However, the Gaussian assumption is good at early stages of denoising (t 0) when the signal-to-noise ratio (SNR) is low. Remark 2. The post-processing presented in this section is similar to [17], who propose to apply a correction proportional to In practice, we found that including this scaling contributes greatly to the numerical stability and efficiency of continuous-time sampling. B.1 Training The training of our denoiser-based diffusion models largely follows the methodology proposed in [45]. In this section, we present the most relevant components for completeness and better reproducibility.