AITopics

2006.07197

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California (0.04)
Europe > Norway (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry > Utilities (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Chanpuriya, Sudhanshu, Musco, Cameron, Sotiropoulos, Konstantinos, Tsourakakis, Charalampos E.

Node Embeddings and Exact Low-Rank Representations of Complex Networks

arXiv.org Machine LearningOct-16-2020

Low-dimensional embeddings, from classical spectral embeddings to modern neural-net-inspired methods, are a cornerstone in the modeling and analysis of complex networks. Recent work by Seshadhri et al. (PNAS 2020) suggests that such embeddings cannot capture local structure arising in complex networks. In particular, they show that any network generated from a natural low-dimensional model cannot be both sparse and have high triangle density (high clustering coefficient), two hallmark properties of many real-world networks. In this work we show that the results of Seshadhri et al. are intimately connected to the model they use rather than the low-dimensional structure of complex networks. Specifically, we prove that a minor relaxation of their model can generate sparse graphs with high triangle density. Surprisingly, we show that this same model leads to exact low-dimensional factorizations of many real-world networks. We give a simple algorithm based on logistic principal component analysis (LPCA) that succeeds in finding such exact embeddings. Finally, we perform a large number of experiments that verify the ability of very low-dimensional embeddings to capture local structure in real-world networks.

artificial intelligence, data mining, machine learning, (17 more...)

2006.05592

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

arXiv.org Machine LearningOct-15-2020

Cascade of Phase Transitions for Multi-Scale Clustering

Bonnaire, T., Decelle, A., Aghanim, N.

Following these steps, we aim at showing how the latter formulation can be useful to understand and analyse Many optimisation and inference problems have been the outcome of GMMs. In particular, we exploit the shown to have an equivalent formulation in statistical cascade of phase transitions occurring during annealing physics [1, 2] that allowed a brand-new look at some longstanding procedures of the EM algorithm to build a hierarchical problems and improved the understanding of multi-scale description of a dataset. By defining an overlap complex systems [3, 4]. In particular, the identification of between the ground truth and the inferred partitions, the phase diagram of a model can bring interesting new we show on artificial datasets how it can be interpreted insights such as knowing if a given information can be as an order parameter whose value follows the sequence retrieved depending on the model's parameters and the of phase transitions.

artificial intelligence, dataset, machine learning, (17 more...)

2010.07955

Country:

Europe > Spain > Galicia > Madrid (0.05)
Europe > France (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceOct-14-2020

Aerodynamic Data Predictions Based on Multi-task Learning

Hu, Liwei, Xiang, Yu, Zhan, Jun, Shi, Zifang, Wang, Wenzheng

The quality of datasets is one of the key factors that affect the accuracy of aerodynamic data models. For example, in the uniformly sampled Burgers' dataset, the insufficient high-speed data is overwhelmed by massive low-speed data. Predicting high-speed data is more difficult than predicting low-speed data, owing to that the number of high-speed data is limited, i.e. the quality of the Burgers' dataset is not satisfactory. To improve the quality of datasets, traditional methods usually employ the data resampling technology to produce enough data for the insufficient parts in the original datasets before modeling, which increases computational costs. Recently, the mixtures of experts have been used in natural language processing to deal with different parts of sentences, which provides a solution for eliminating the need for data resampling in aerodynamic data modeling. Motivated by this, we propose the multi-task learning (MTL), a datasets quality-adaptive learning scheme, which combines task allocation and aerodynamic characteristics learning together to disperse the pressure of the entire learning task. The task allocation divides a whole learning task into several independent subtasks, while the aerodynamic characteristics learning learns these subtasks simultaneously to achieve better precision. Two experiments with poor quality datasets are conducted to verify the data quality-adaptivity of the MTL to datasets. The results show than the MTL is more accurate than FCNs and GANs in poor quality datasets.

artificial intelligence, denote, machine learning, (16 more...)

2010.09475

Country:

Asia > China > Sichuan Province > Chengdu (0.05)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Yajima, Yuta, Inokuchi, Akihiro

Refining Similarity Matrices to Cluster Attributed Networks Accurately

arXiv.org Artificial IntelligenceOct-14-2020

As a result of the recent popularity of social networks and the increase in the number of research papers published across all fields, attributed networks consisting of relationships between objects, such as humans and the papers, that have attributes are becoming increasingly large. Therefore, various studies for clustering attributed networks into sub-networks are being actively conducted. When clustering attributed networks using spectral clustering, the clustering accuracy is strongly affected by the quality of the similarity matrices, which are input into spectral clustering and represent the similarities between pairs of objects. In this paper, we aim to increase the accuracy by refining the matrices before applying spectral clustering to them. We verify the practicability of our proposed method by comparing the accuracy of spectral clustering with similarity matrices before and after refining them.

artificial intelligence, data mining, machine learning, (17 more...)

2010.06854

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

#artificialintelligenceOct-13-2020, 01:31:41 GMT

Hierarchy Clustering

Just like this KMeans clustering, our intention is to create clusters within our dataset, grouping related data so that we may determine different classes or groupings, to allow us to make predictions based on this information in a wide array of applications. However, in the areas in which KMeans fails, Hierarchy Clustering attempts to alleviate the burden somewhat with its several choices of novel techniques such as single-link clustering, or Ward Clustering, Hierarchy Clustering techniques chosen by the user, depending on the layout of their dataset. Hierarchy Clustering at the end of the day is just a regular clustering algorithm, with its advantages and disadvantages, and is by no means the successor of KMeans. Generally, Hierarchy Clustering works as follows; You start off with your dataset, which may be spaced out strange or have a weird layout with strange densities, where clusters are not easily differentiable by you, by all means, you honestly have no idea. That's alright, that usually is the case in real-world problems.

artificial intelligence, clustering, machine learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.56)

Fuchs, Robin, Pommeret, Denys, Viroli, Cinzia

Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets

arXiv.org Machine LearningOct-13-2020

Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. Two major difficulties lie in the initialisation of the algorithms and in making variables comparable between types. This work is concerned with these two problems. We introduce a two-heads architecture model-based clustering method called Mixed data Deep Gaussian Mixture Model (MDGMM) that can be viewed as an automatic way to merge the clusterings performed separately on continuous and non continuous data. We also design a new initialisation strategy and a data driven method that selects "on the fly" the best specification of the model and the optimal number of clusters for a given dataset. Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets

algorithm, artificial intelligence, machine learning, (17 more...)

2010.06661

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningOct-13-2020

Penalized model-based clustering of fMRI data

DiLernia, Andrew, Quevedo, Karina, Camchong, Jazmin, Lim, Kelvin, Pan, Wei, Zhang, Lin

Functional magnetic resonance imaging (fMRI) data have become increasingly available and are useful for describing functional connectivity (FC), the relatedness of neuronal activity in regions of the brain. This FC of the brain provides insight into certain neurodegenerative diseases and psychiatric disorders, and thus is of clinical importance. To help inform physicians regarding patient diagnoses, unsupervised clustering of subjects based on FC is desired, allowing the data to inform us of groupings of patients based on shared features of connectivity. Since heterogeneity in FC is present even between patients within the same group, it is important to allow subject-level differences in connectivity, while still pooling information across patients within each group to describe group-level FC. To this end, we propose a random covariance clustering model (RCCM) to concurrently cluster subjects based on their FC networks, estimate the unique FC networks of each subject, and to infer shared network features. Although current methods exist for estimating FC or clustering subjects using fMRI data, our novel contribution is to cluster or group subjects based on similar FC of the brain while simultaneously providing group- and subject-level FC network estimates. The competitive performance of RCCM relative to other methods is demonstrated through simulations in various settings, achieving both improved clustering of subjects and estimation of FC networks. Utility of the proposed method is demonstrated with application to a resting-state fMRI data set collected on 43 healthy controls and 61 participants diagnosed with schizophrenia.

artificial intelligence, glasso & k-means 0, machine learning, (15 more...)

2010.06408

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceOct-12-2020

The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

Han, Xin, Zhu, Ye, Ting, Kai Ming, Li, Gang

Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have one key issue: it has difficulty in identifying adjacent clusters with varied densities, regardless of the cluster extraction methods applied on the resultant dendrogram. In this paper, we identify the root cause of this issue and show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it. We analyse the condition under which existing AHC methods fail to extract clusters effectively; and the reason why the data-dependent kernel is an effective remedy. This leads to a new approach to kernerlise existing hierarchical clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL and PHA. In each of these algorithms, our empirical evaluation shows that a recently introduced Isolation Kernel produces a higher quality or purer dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel.

algorithm, dendrogram, linkage function, (15 more...)

doi: 10.1016/j.patcog.2023.109517

2010.05473

Country:

Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Kansas (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceOct-11-2020

Local Connectivity in Centroid Clustering

P, Deepak

Clustering is a fundamental task in unsupervised learning, one that targets to group a dataset into clusters of similar objects. There has been recent interest in embedding normative considerations around fairness within clustering formulations. In this paper, we propose 'local connectivity' as a crucial factor in assessing membership desert in centroid clustering. We use local connectivity to refer to the support offered by the local neighborhood of an object towards supporting its membership to the cluster in question. We motivate the need to consider local connectivity of objects in cluster assignment, and provide ways to quantify local connectivity in a given clustering. We then exploit concepts from density-based clustering and devise LOFKM, a clustering method that seeks to deepen local connectivity in clustering outputs, while staying within the framework of centroid clustering. Through an empirical evaluation over real-world datasets, we illustrate that LOFKM achieves notable improvements in local connectivity at reasonable costs to clustering quality, illustrating the effectiveness of the method.

artificial intelligence, local connectivity, machine learning, (14 more...)

doi: 10.1145/3410566.3410601

2010.05353

Country:

Asia > South Korea > Seoul > Seoul (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)