Goto

Collaborating Authors

 mean shift


Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models

Neural Information Processing Systems

Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their key idea is to jointly train an encoder for discovering meaningful representations from images and a conditional DPM as the decoder for reconstructing images.


Stochastic Mean-Shift Clustering

Lapidot, Itshak, Sepulcre, Yann, Trigano, Tom

arXiv.org Artificial Intelligence

Numerous algorithms have been proposed and investigated, among which the k means [1], Spectral clustering [2, 3], DB-SCAN [4], and the well-known Mean-shift (MS) clustering algorithm. MS is an effective non-parametric iterative algorithm [5], which is versatile for clustering, tracking, and smoothing tasks. A well-known and used variant of MS is the blurring mean-shift (BMS) [6]. Both MS and BMS algorithms can be coined "deterministic" iterative procedures aiming to find local maximiz-ers of an objective function, since they do not involve any random selection of points to perform their update rule. Both MS and BMS algorithms have been applied to a variety of domains, and several variations around their original formulation have been proposed: see [7] for BMS with a Gaussian kernel (known as Gaussian blurring mean-shift); for BMS applied to high-dimensional data clustering see [8].


High-Dimensional Statistical Process Control via Manifold Fitting and Learning

Tas, Burak I., del Castillo, Enrique

arXiv.org Machine Learning

We address the Statistical Process Control (SPC) of high-dimensional, dynamic industrial processes from two complementary perspectives: manifold fitting and manifold learning, both of which assume data lies on an underlying nonlinear, lower dimensional space. We propose two distinct monitoring frameworks for online or 'phase II' Statistical Process Control (SPC). The first method leverages state-of-the-art techniques in manifold fitting to accurately approximate the manifold where the data resides within the ambient high-dimensional space. It then monitors deviations from this manifold using a novel scalar distribution-free control chart. In contrast, the second method adopts a more traditional approach, akin to those used in linear dimensionality reduction SPC techniques, by first embedding the data into a lower-dimensional space before monitoring the embedded observations. We prove how both methods provide a controllable Type I error probability, after which they are contrasted for their corresponding fault detection ability. Extensive numerical experiments on a synthetic process and on a replicated Tennessee Eastman Process show that the conceptually simpler manifold-fitting approach achieves performance competitive with, and sometimes superior to, the more classical lower-dimensional manifold monitoring methods. In addition, we demonstrate the practical applicability of the proposed manifold-fitting approach by successfully detecting surface anomalies in a real image dataset of electrical commutators.



Causality-informed Anomaly Detection in Partially Observable Sensor Networks: Moving beyond Correlations

Xiao, Xiaofeng, Shen, Bo, Yue, Xubo

arXiv.org Artificial Intelligence

Nowadays, as AI-driven manufacturing becomes increasingly popular, the volume of data streams requiring real-time monitoring continues to grow. However, due to limited resources, it is impractical to place sensors at every location to detect unexpected shifts. Therefore, it is necessary to develop an optimal sensor placement strategy that enables partial observability of the system while detecting anomalies as quickly as possible. Numerous approaches have been proposed to address this challenge; however, most existing methods consider only variable correlations and neglect a crucial factor: Causality. Moreover, although a few techniques incorporate causal analysis, they rely on interventions-artificially creating anomalies-to identify causal effects, which is impractical and might lead to catastrophic losses. In this paper, we introduce a causality-informed deep Q-network (Causal DQ) approach for partially observable sensor placement in anomaly detection. By integrating causal information at each stage of Q-network training, our method achieves faster convergence and tighter theoretical error bounds. Furthermore, the trained causal-informed Q-network significantly reduces the detection time for anomalies under various settings, demonstrating its effectiveness for sensor placement in large-scale, real-world data streams. Beyond the current implementation, our technique's fundamental insights can be applied to various reinforcement learning problems, opening up new possibilities for real-world causality-informed machine learning methods in engineering applications.


Convergence of Mean Shift Algorithms for Large Bandwidths and Simultaneous Accurate Clustering

Pal, Susovan, Vepakomma, Praneeth

arXiv.org Machine Learning

The mean shift (MS) is a non-parametric, density-based, iterative algorithm that has prominent usage in clustering and image segmentation. A rigorous proof for its convergence in full generality remains unknown. Two significant steps in this direction were taken in the paper \cite{Gh1}, which proved that for \textit{sufficiently large bandwidth}, the MS algorithm with the Gaussian kernel always converges in any dimension, and also by the same author in \cite{Gh2}, proved that MS always converges in one dimension for kernels with differentiable, strictly decreasing, convex profiles. In the more recent paper \cite{YT}, they have proved the convergence in more generality,\textit{ without any restriction on the bandwidth}, with the assumption that the KDE $f$ has a continuous Lipschitz gradient on the closure of the convex hull of the trajectory of the iterated sequence of the mode estimate, and also satisfies the Łojasiewicz property there. The main theoretical result of this paper is a generalization of those of \cite{Gh1}, where we show that (1) for\textit{ sufficiently large bandwidth} convergence is guaranteed in any dimension with \textit{any radially symmetric and strictly positive definite kernels}. The proof uses two alternate characterizations of radially symmetric positive definite smooth kernels by Schoenberg and Bernstein \cite{Fass}, and borrows some steps from the proofs in \cite{Gh1}. Although the authors acknowledge that the result in that paper is more restrictive than that of \cite{YT} due to the lower bandwidth limit, it uses a different set of assumptions than \cite{YT}, and the proof technique is different.


On the Impact of Performative Risk Minimization for Binary Random Variables

Tsoy, Nikita, Kirev, Ivan, Rahimiyazdi, Negin, Konstantinov, Nikola

arXiv.org Machine Learning

Performativity, the phenomenon where outcomes are influenced by predictions, is particularly prevalent in social contexts where individuals strategically respond to a deployed model. In order to preserve the high accuracy of machine learning models under distribution shifts caused by performativity, Perdomo et al. (2020) introduced the concept of performative risk minimization (PRM). While this framework ensures model accuracy, it overlooks the impact of the PRM on the underlying distributions and the predictions of the model. In this paper, we initiate the analysis of the impact of PRM, by studying performativity for a sequential performative risk minimization problem with binary random variables and linear performative shifts. We formulate two natural measures of impact. In the case of full information, where the distribution dynamics are known, we derive explicit formulas for the PRM solution and our impact measures. In the case of partial information, we provide performative-aware statistical estimators, as well as simulations. Our analysis contrasts PRM to alternatives that do not model data shift and indicates that PRM can have amplified side effects compared to such methods.


Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models

Neural Information Processing Systems

Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their key idea is to jointly train an encoder for discovering meaningful representations from images and a conditional DPM as the decoder for reconstructing images. Specifically, we find that the reason that pre-trained DPMs fail to reconstruct an image from its latent variables is due to the information loss of forward process, which causes a gap between their predicted posterior mean and the true one. From this perspective, the classifier-guided sampling method can be explained as computing an extra mean shift to fill the gap, reconstructing the lost class information in samples.


On the Trajectory Regularity of ODE-based Diffusion Sampling

Chen, Defang, Zhou, Zhenyu, Wang, Can, Shen, Chunhua, Lyu, Siwei

arXiv.org Artificial Intelligence

Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in $5\sim 10$ function evaluations.


f9b902fc3289af4dd08de5d1de54f68f-Reviews.html

Neural Information Processing Systems

This paper proposes a discriminative clustering algorithm inspired by mean shift and the idea of finding local maxima of the density ratio (ratio of the densities of positive and negative points). The work is motivated by recent approaches of [4,8,16] aimed at discovering distinctive mid-level parts or patches for various recognition tasks. In the authors' own words from the Intro, "The idea is to search for clusters of image patches that are both 1) representative... and 2) visually discriminative. Unfortunately, finding patches that fit these criteria remain rather ad-hoc and poorly understood. While most current algorithms use a discriminative clustering-like procedure, they generally don't optimize elements for these criteria, or do so in an indirect, procedural way that is difficult to analyze. Hence, our goal in this work is to quantify the terms'representative' and'discriminative', and show that that a generalization of the well-known, well-understood Mean-Shift algorithm can produce visual elements that are more representative and discriminative than those of previous approaches."