AITopics

doi: 10.48786/EDBT.2022.21

2203.00812

Country:

North America > United States > California > Orange County > Irvine (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Smart Houses & Appliances (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Juodelyte, Dovile, Cheplygina, Veronika, Graversen, Therese, Bonnet, Philippe

Predicting Bearings' Degradation Stages for Predictive Maintenance in the Pharmaceutical Industry

arXiv.org Artificial IntelligenceMar-7-2022

In the pharmaceutical industry, the maintenance of production machines must be audited by the regulator. In this context, the problem of predictive maintenance is not when to maintain a machine, but what parts to maintain at a given point in time. The focus shifts from the entire machine to its component parts and prediction becomes a classification problem. In this paper, we focus on rolling-elements bearings and we propose a framework for predicting their degradation stages automatically. Our main contribution is a k-means bearing lifetime segmentation method based on high-frequency bearing vibration signal embedded in a latent low-dimensional subspace using an AutoEncoder. Given high-frequency vibration data, our framework generates a labeled dataset that is used to train a supervised model for bearing degradation stage detection. Our experimental results, based on the FEMTO Bearing dataset, show that our framework is scalable and that it provides reliable and actionable predictions for a range of different bearings.

artificial intelligence, bearing, machine learning, (18 more...)

doi: 10.1145/3534678.3539057

2203.03259

Country:

North America > United States (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Min, Rui, Garnier, Christelle, Septier, François, Klein, John

State space partitioning based on constrained spectral clustering for block particle filtering

arXiv.org Machine LearningMar-7-2022

The particle filter (PF) is a powerful inference tool widely used to estimate the filtering distribution in non-linear and/or non-Gaussian problems. To overcome the curse of dimensionality of PF, the block PF (BPF) inserts a blocking step to partition the state space into several subspaces or blocks of smaller dimension so that the correction and resampling steps can be performed independently on each subspace. Using blocks of small size reduces the variance of the filtering distribution estimate, but in turn the correlation between blocks is broken and a bias is introduced. When the dependence relationships between state variables are unknown, it is not obvious to decide how to split the state space into blocks and a significant error overhead may arise from a poor choice of partitioning. In this paper, we formulate the partitioning problem in the BPF as a clustering problem and we propose a state space partitioning method based on spectral clustering (SC). We design a generalized BPF algorithm that contains two new steps: (i) estimation of the state vector correlation matrix from predicted particles, (ii) SC using this estimate as the similarity matrix to determine an appropriate partition. In addition, a constraint is imposed on the maximal cluster size to prevent SC from providing too large blocks. We show that the proposed method can bring together in the same blocks the most correlated state variables while successfully escaping the curse of dimensionality.

bpf, particle, partition, (16 more...)

2203.03475

Country: Europe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Gupta, Shubham, Dukkipati, Ambedkar

On consistency of constrained spectral clustering under representation-aware stochastic block model

arXiv.org Machine LearningMar-3-2022

Spectral clustering is widely used in practice due to its flexibility, computational efficiency, and well-understood theoretical performance guarantees. Recently, spectral clustering has been studied to find balanced clusters under population-level constraints. These constraints are specified by additional information available in the form of auxiliary categorical node attributes. In this paper, we consider a scenario where these attributes may not be observable, but manifest as latent features of an auxiliary graph. Motivated by this, we study constrained spectral clustering with the aim of finding balanced clusters in a given \textit{similarity graph} $\mathcal{G}$, such that each individual is adequately represented with respect to an auxiliary graph $\mathcal{R}$ (we refer to this as representation graph). We propose an individual-level balancing constraint that formalizes this idea. Our work leads to an interesting stochastic block model that not only plants the given partitions in $\mathcal{G}$ but also plants the auxiliary information encoded in the representation graph $\mathcal{R}$. We develop unnormalized and normalized variants of spectral clustering in this setting. These algorithms use $\mathcal{R}$ to find clusters in $\mathcal{G}$ that approximately satisfy the proposed constraint. We also establish the first statistical consistency result for constrained spectral clustering under individual-level constraints for graphs sampled from the above-mentioned variant of the stochastic block model. Our experimental results corroborate our theoretical findings.

algorithm, constraint, spectral, (16 more...)

2203.02005

Country:

Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre:

Instructional Material (0.46)
Research Report (0.40)

Industry: Information Technology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Antonietti, P. F., Dassi, F., Manuzzi, E.

Machine Learning based refinement strategies for polyhedral grids with applications to Virtual Element and polyhedral Discontinuous Galerkin methods

arXiv.org Artificial IntelligenceMar-1-2022

We propose two new strategies based on Machine Learning techniques to handle polyhedral grid refinement, to be possibly employed within an adaptive framework. The first one employs the k-means clustering algorithm to partition the points of the polyhedron to be refined. This strategy is a variation of the well known Centroidal Voronoi Tessellation. The second one employs Convolutional Neural Networks to classify the "shape" of an element so that "ad-hoc" refinement criteria can be defined. This strategy can be used to enhance existing refinement strategies, including the k-means strategy, at a low online computational cost. We test the proposed algorithms considering two families of finite element methods that support arbitrarily shaped polyhedral elements, namely the Virtual Element Method (VEM) and the Polygonal Discontinuous Galerkin (PolyDG) method. We demonstrate that these strategies do preserve the structure and the quality of the underlaying grids, reducing the overall computational cost and mesh complexity.

artificial intelligence, deep learning, machine learning, (19 more...)

doi: 10.1016/j.jcp.2022.111531

2202.12654

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Machine LearningFeb-27-2022

Strong Consistency for a Class of Adaptive Clustering Procedures

Jaffe, Adam Quinn

We introduce a class of clustering procedures which includes $k$-means and $k$-medians, as well as variants of these where the domain of the cluster centers can be chosen adaptively (for example, $k$-medoids) and where the number of cluster centers can be chosen adaptively (for example, according to the elbow method). In the non-parametric setting and assuming only the finiteness of certain moments, we show that all clustering procedures in this class are strongly consistent under IID samples. Our method of proof is to directly study the continuity of various deterministic maps associated with these clustering procedures, and to show that strong consistency simply descends from analogous strong consistency of the empirical measures. In the adaptive setting, our work provides a strong consistency result that is the first of its kind. In the non-adaptive setting, our work strengthens Pollard's classical result by dispensing with various unnecessary technical hypotheses, by upgrading the particular notion of strong consistency, and by using the same methods to prove further limit theorems.

cluster center, convergence, procedure, (13 more...)

2202.13423

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Czechia > Prague (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dinari, Or, Freifeld, Oren

Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data

arXiv.org Machine LearningFeb-27-2022

Practical tools for clustering streaming data must be fast enough to handle the arrival rate of the observations. Typically, they also must adapt on the fly to possible lack of stationarity; i.e., the data statistics may be time-dependent due to various forms of drifts, changes in the number of clusters, etc. The Dirichlet Process Mixture Model (DPMM), whose Bayesian nonparametric nature allows it to adapt its complexity to the data, seems a natural choice for the streaming-data case. In its classical formulation, however, the DPMM cannot capture common types of drifts in the data statistics. Moreover, and regardless of that limitation, existing methods for online DPMM inference are too slow to handle rapid data streams. In this work we propose adapting both the DPMM and a known DPMM sampling-based non-streaming inference method for streaming-data clustering. We demonstrate the utility of the proposed method on several challenging settings, where it obtains state-of-the-art results while being on par with other methods in terms of speed.

dbstream mb-kmean, mb-kmean, sampler, (14 more...)

2202.13312

Country:

North America > United States > Massachusetts (0.04)
Asia > Middle East > Israel (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.93)

arXiv.org Artificial IntelligenceFeb-22-2022

Temporal Subtyping of Alzheimer's Disease Using Medical Conditions Preceding Alzheimer's Disease Onset in Electronic Health Records

He, Zhe, Tian, Shubo, Erdengasileng, Arslan, Charness, Neil, Bian, Jiang

Subtyping of Alzheimer's disease (AD) can facilitate diagnosis, treatment, prognosis and disease management. It can also support the testing of new prevention and treatment strategies through clinical trials. In this study, we employed spectral clustering to cluster 29,922 AD patients in the OneFlorida Data Trust using their longitudinal EHR data of diagnosis and conditions into four subtypes. In addition, according to the results of various statistical tests, these subtypes are also significantly different with respect to demographics, mortality, and prescription medications after the AD diagnosis. This study could potentially facilitate early detection and personalized treatment of AD as well as data-driven generalizability assessment of clinical trials for AD. Introduction Alzheimer's disease (AD) is a progressive neurodegenerative disorder that affects an estimated 6.2 million Americans age 65 and older in 2021. This number is likely to reach 13.8 million by 2060.

artificial intelligence, machine learning, natural language, (19 more...)

2202.10991

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Alaska (0.05)
North America > United States > Florida > Leon County > Tallahassee (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Nguyen, Tin D., Trippe, Brian L., Broderick, Tamara

Many processors, little time: MCMC for partitions via optimal transport couplings

arXiv.org Machine LearningFeb-22-2022

Markov chain Monte Carlo (MCMC) methods are often used in clustering since they guarantee asymptotically exact expectations in the infinite-time limit. In finite time, though, slow mixing often leads to poor performance. Modern computing environments offer massive parallelism, but naive implementations of parallel MCMC can exhibit substantial bias. In MCMC samplers of continuous random variables, Markov chain couplings can overcome bias. But these approaches depend crucially on paired chains meetings after a small number of transitions. We show that straightforward applications of existing coupling ideas to discrete clustering variables fail to meet quickly. This failure arises from the "label-switching problem": semantically equivalent cluster relabelings impede fast meeting of coupled chains. We instead consider chains as exploring the space of partitions rather than partitions' (arbitrary) labelings. Using a metric on the partition space, we formulate a practical algorithm using optimal transport couplings. Our theory confirms our method is accurate and efficient. In experiments ranging from clustering of genes or seeds to graph colorings, we show the benefits of our coupling in the highly parallel, time-limited regime.

coupling, equation, partition, (15 more...)

2202.11258

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Tasmania (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Arias-Castro, Ery, Qiao, Wanli

Clustering by Hill-Climbing: Consistency Results

arXiv.org Machine LearningFeb-18-2022

We consider several hill-climbing approaches to clustering as formulated by Fukunaga and Hostetler in the 1970's. We study both continuous-space and discrete-space (i.e., medoid) variants and establish their consistency.

algorithm, converge, critical point, (17 more...)

2202.09023

Country:

Oceania > New Zealand (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)