Goto

Collaborating Authors

 Data Mining


On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

Neural Information Processing Systems

Kullback-Leibler (KL) divergence is one of the most important measures to calculate the difference between probability distributions. In this paper, we theoretically study several properties of KL divergence between multivariate Gaussian distributions.


MEMTO: Memory-guided Transformer for Multivariate Time Series Anomaly Detection Junho Song 1 Keonwoo Kim Jeonglyul Oh1

Neural Information Processing Systems

Detecting anomalies in real-world multivariate time series data is challenging due to complex temporal dependencies and inter-variable correlations. Recently, reconstruction-based deep models have been widely used to solve the problem. However, these methods still suffer from an over-generalization issue and fail to deliver consistently high performance. To address this issue, we propose the MEMTO, a memory-guided Transformer using a reconstruction-based approach. It is designed to incorporate a novel memory module that can learn the degree to which each memory item should be updated in response to the input data. To stabilize the training procedure, we use a two-phase training paradigm which involves using K-means clustering for initializing memory items. Additionally, we introduce a bi-dimensional deviation-based detection criterion that calculates anomaly scores considering both input space and latent space. We evaluate our proposed method on five real-world datasets from diverse domains, and it achieves an average anomaly detection F1-score of 95.74%, significantly outperforming the previous state-of-the-art methods. We also conduct extensive experiments to empirically validate the effectiveness of our proposed model's key components.


Efficient Batched Algorithm for Contextual Linear Bandits with Large Action Space via Soft Elimination

Neural Information Processing Systems

In this paper, we provide the first efficient batched algorithm for contextual linear bandits with large action spaces. Unlike existing batched algorithms that rely on action elimination, which are not implementable for large action sets, our algorithm only uses a linear optimization oracle over the action set to design the policy. The proposed algorithm achieves a regret upper bound ร•( T) with high probability, and uses O(log log T) batches, matching the lower bound on the number of batches [13].



Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

Neural Information Processing Systems

We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem. We introduce a novel condition, named the Local Anti-Concentration (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency. We show that the LAC condition is satisfied by a broad class of distributions, including Gaussian, exponential, uniform, Cauchy, and Student's t distributions, along with other exponential family distributions and their truncated variants. This significantly expands the class of distributions under which greedy algorithms can perform efficiently. Under our proposed LAC condition, we prove that the cumulative expected regret of the greedy algorithm for the linear contextual bandit is bounded by O(poly log T). Our results establish the widest range of distributions known to date that allow a sublinear regret bound for greedy algorithms, further achieving a sharp poly-logarithmic regret.


Locality Sensitive Hashing in Fourier Frequency Domain For Soft Set Containment Search

Neural Information Processing Systems

In many search applications related to passage retrieval, text entailment, and subgraph search, the query and each'document' is a set of elements, with a document being relevant if it contains the query. These elements are not represented by atomic IDs, but by embedded representations, thereby extending set containment to soft set containment. Recent applications address soft set containment by encoding sets into fixed-size vectors and checking for elementwise vector dominance. This 0/1 property can be relaxed to an asymmetric hinge distance for scoring and ranking candidate documents. Here we focus on data-sensitive, trainable indices for fast retrieval of relevant documents. Existing LSH methods are designed for mostly symmetric or few simple asymmetric distance functions, which are not suitable for hinge distance. Instead, we transform hinge distance into a proposed dominance similarity measure, to which we then apply a Fourier transform, thereby expressing dominance similarity as an expectation of inner products of functions in the frequency domain.



A Batch-to-Online Transformation under Random-Order Model

Neural Information Processing Systems

We introduce a transformation framework that can be utilized to develop online algorithms with low ฯต-approximate regret in the random-order model from offline approximation algorithms. We first give a general reduction theorem that transforms an offline approximation algorithm with low average sensitivity to an online algorithm with low ฯต-approximate regret. We then demonstrate that offline approximation algorithms can be transformed into a low-sensitivity version using a coreset construction method. To showcase the versatility of our approach, we apply it to various problems, including online (k, z)-clustering, online matrix approximation, and online regression, and successfully achieve polylogarithmic ฯต-approximate regret for each problem. Moreover, we show that in all three cases, our algorithm also enjoys low inconsistency, which may be desired in some online applications.