AITopics

2011.10254

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Namazi, Reza, Zolanvari, Arsham, Sani, Mahdi, Ghahramani, Seyed Amir Ali Ghafourian

GL-Coarsener: A Graph representation learning framework to construct coarse grid hierarchy for AMG solvers

arXiv.org Artificial IntelligenceNov-19-2020

In many numerical schemes, the computational complexity scales non-linearly with the problem size. Solving a linear system of equations using direct methods or most iterative methods is a typical example. Algebraic multi-grid (AMG) methods are numerical methods used to solve large linear systems of equations efficiently. One of the main differences between AMG methods is how the coarser grid is constructed from a given fine grid. There are two main classes of AMG methods; graph and aggregation based coarsening methods. Here we propose an aggregation-based coarsening framework leveraging graph representation learning and clustering algorithms. Our method introduces the power of machine learning into the AMG research field and opens a new perspective for future researches. The proposed method uses graph representation learning techniques to learn latent features of the graph obtained from the underlying matrix of coefficients. Using these extracted features, we generated a coarser grid from the fine grid. The proposed method is highly capable of parallel computations. Our experiments show that the proposed method's efficiency in solving large systems is closely comparable with other aggregation-based methods, demonstrating the high capability of graph representation learning in designing multi-grid solvers.

graph, grid, node, (17 more...)

2011.09994

Country:

North America > United States > Wyoming (0.04)
North America > United States > Virginia (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceNov-18-2020

Introduction to Core-sets: an Updated Survey

Feldman, Dan

In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of k centers. However, traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data", where every possible query has approximately the same answer on both data sets. Generic techniques enable efficient coreset maintenance of streaming, distributed and dynamic data. Traditional algorithms can then be applied on these coresets to maintain the approximated optimal solutions. The challenge is to design coresets with provable tradeoff between their size and approximation error. This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art. Bringing big data to the enterprise, 2012) are generated by cheap and numerous information-sensing mobile devices, remote sensing, software logs, cameras, microphones, RFID readers and wireless sensor networks (Segaran & Hammerbacher, 2009; Hellerstein, 2008; Funke & Laue, 2007). These require clustering algorithms that, unlike traditional algorithms, (a) learn unbounded streaming data that cannot fit into main memory, (b) run in parallel on distributed data among thousands of machines, (c) use low communication between the machines (d) apply real-time computations on the device, (e) handle privacy and security issues. A common approach is to reinvent computer science for handling these new computational models, and develop new algorithms "from scratch" independently of existing solutions.

algorithm, coreset, feldman, (15 more...)

2011.09384

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
(2 more...)

Genre:

Overview (0.66)
Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Leroy, Arthur, Latouche, Pierre, Guedj, Benjamin, Gey, Servane

Cluster-Specific Predictions with Multi-Task Gaussian Processes

arXiv.org Machine LearningNov-17-2020

A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty on both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performances when dealing with group-structured data. The model handles irregular grid of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real datasets. The overall algorithm, called MagmaClust, is publicly available as an R package.

algorithm, magmaclust, prediction, (16 more...)

2011.07866

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
Oceania > New Zealand (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

arXiv.org Machine LearningNov-17-2020

Order preserving hierarchical agglomerative clustering

Bakkelund, Daniel

We present a method for hierarchical clustering of directed acyclic graphs and other strictly partially ordered data that preserves the data structure. In particular, if we have $a

dendrogram, partial dendrogram, relation, (15 more...)

2004.12488

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Aguiar, Henrique, Santos, Mauro, Watkinson, Peter, Zhu, Tingting

Phenotyping Clusters of Patient Trajectories suffering from Chronic Complex Disease

arXiv.org Artificial IntelligenceNov-16-2020

Recent years have seen an increased focus into the tasks of predicting hospital inpatient risk of deterioration and trajectory evolution due to the availability of electronic patient data. A common approach to these problems involves clustering patients time-series information such as vital sign observations) to determine dissimilar subgroups of the patient population. Most clustering methods assume time-invariance of vital-signs and are unable to provide interpretability in clusters that is clinically relevant, for instance, event or outcome information. In this work, we evaluate three different clustering models on a large hospital dataset of vital-sign observations from patients suffering from Chronic Obstructive Pulmonary Disease. We further propose novel modifications to deal with unevenly sampled time-series data and unbalanced class distribution to improve phenotype separation. Lastly, we discuss further avenues of investigation for models to learn patient subgroups with distinct behaviour and phenotype.

patient trajectory, phenotyping cluster, representation, (13 more...)

2011.08356

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.16)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Vital Signs (0.80)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Brehmer, Johann, Macaluso, Sebastian, Pappadopulo, Duccio, Cranmer, Kyle

Hierarchical clustering in particle physics through reinforcement learning

arXiv.org Artificial IntelligenceNov-16-2020

Particle physics experiments often require the reconstruction of decay patterns through a hierarchical clustering of the observed final-state particles. We show that this task can be phrased as a Markov Decision Process and adapt reinforcement learning algorithms to solve it. In particular, we show that Monte-Carlo Tree Search guided by a neural policy can construct high-quality hierarchical clusterings and outperform established greedy and beam search baselines.

aaaa4hicpvlbb9y4fz7d3rbuld 1x3h1kiaxyzhm, latexit latexit sha1, osca zyfsiwf7 wy0yglfwcdcaenkaxscre5t2z utarou1xygu9to, (10 more...)

2011.08191

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Cohen-Shapira, Noy, Rokach, Lior

Automatic selection of clustering algorithms using supervised graph embedding

arXiv.org Machine LearningNov-16-2020

The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led to increased interest in automated ML solutions that reduce the need for human intervention. One of the main challenges in applying ML to previously unseen problems is algorithm selection - the identification of high-performing algorithm(s) for a given dataset, task, and evaluation measure. This study addresses the algorithm selection challenge for data clustering, a fundamental task in data mining that is aimed at grouping similar objects. We present MARCO-GE, a novel meta-learning approach for the automated recommendation of clustering algorithms. MARCO-GE first transforms datasets into graphs and then utilizes a graph convolutional neural network technique to extract their latent representation. Using the embedding representations obtained, MARCO-GE trains a ranking meta-model capable of accurately recommending top-performing algorithms for a new dataset and clustering evaluation measure. Extensive evaluation on 210 datasets, 13 clustering algorithms, and 10 clustering measures demonstrates the effectiveness of our approach and its dominance in terms of predictive and generalization performance over state-of-the-art clustering meta-learning approaches.

algorithm, dataset, marco-ge, (15 more...)

2011.08225

Country:

Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningNov-16-2020

PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation Networks for Incomplete Data

Wang, Yufeng, Li, Dan, Li, Xiang, Yang, Min

Datasets with missing values are very common in real world applications. GAIN, a recently proposed deep generative model for missing data imputation, has been proved to outperform many state-of-the-art methods. But GAIN only uses a reconstruction loss in the generator to minimize the imputation error of the non-missing part, ignoring the potential category information which can reflect the relationship between samples. In this paper, we propose a novel unsupervised missing data imputation method named PC-GAIN, which utilizes potential category information to further enhance the imputation power. Specifically, we first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data. Then an auxiliary classifier is determined based on the synthetic pseudo-labels. Further, this classifier is incorporated into the generative adversarial framework to help the generator to yield higher quality imputation results. The proposed method can significantly improve the imputation quality of GAIN. Experimental results on various benchmark datasets show that our method is also superior to other baseline models.

category information, imputation, pc-gain, (12 more...)

2011.0777

Country:

Asia > China > Shandong Province > Yantai (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningNov-15-2020

Estimation of the number of clusters on d-dimensional sphere

Fujita, Kazuhisa

Spherical data is distributed on the sphere. The data appears in various fields such as meteorology, biology, and natural language processing. However, a method for analysis of spherical data does not develop enough yet. One of the important issues is an estimation of the number of clusters in spherical data. To address the issue, I propose a new method called the Spherical X-means (SX-means) that can estimate the number of clusters on d-dimensional sphere. The SX-means is the model-based method assuming that the data is generated from a mixture of von Mises-Fisher distributions. The present paper explains the proposed method and shows its performance of estimation of the number of clusters.

estimation, sx-means, von mise-fisher distribution, (12 more...)

2011.0753

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York (0.04)
Asia > Malaysia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)