AITopics

2211.06331

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Netherlands (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceNov-9-2022, 01:01:15 GMT

10 Most Used Machine Learning Algorithms In Python With Code

Understanding what Artificial Intelligence is and learning how Machine Learning and Deep Learning power it, are overwhelming experiences. In ML, there is something called the "No Free Lunch" theorems which states that no machine learning algorithms works best for every problem, and it's particularly relevant for supervised learning. For example, you can't say that decision trees are always better than neural networks or vice-versa. There are various factors at play, such as the size and structure of your dataset. As a result, you should try many different algorithms for your problem, while using a hold-out "test set" of data to decide performance and select the winning algorithm.

algorithm, machine learning algorithm, regression, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Scott, Tyler R., Liu, Ting, Mozer, Michael C., Gallagher, Andrew C.

An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?

arXiv.org Artificial IntelligenceNov-9-2022

Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often nearing ceiling), and the experimental methodology seemingly favors the deep methods. We conduct a large-scale empirical study of 17 clustering methods across three datasets and obtain several robust findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they match or even perform worse than shallow, heuristic-based methods. When embeddings are highly discriminative, deep methods do outperform the baselines, consistent with past results, but the margin between methods is much smaller than previously reported. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved. To enable reproducibility, we include all necessary details in the appendices, and plan to release the code.

artificial intelligence, hyperparameter, machine learning, (15 more...)

2211.05183

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Dorabiala, Olga, Webster, Jennifer, Kutz, Nathan, Aravkin, Aleksandr

Spatiotemporal k-means

arXiv.org Artificial IntelligenceNov-9-2022

The widespread use of sensor and data acquisition technologies, including IOT, GPS, RFID, LIDAR, satellite, and cellular networks allows for, among other applications, the continuous monitoring of the positions of moving objects of interest. These technologies create rich spatiotemporal data that is found across many scientific and real-world domains including ecologists' studies of collective animal behavior [13], the surveillance of large groups of people for suspicious activity [17], and traffic management [12]. Often, the data collected is large and unlabeled, motivating the development of unsupervised learning methods that can efficiently extract information about object behavior with no human supervision. In this study, we propose a method of spatiotemporal k-means (STKM) clustering that is able to analyze the multi-scale relationships within spatiotemporal data. Clustering is a major unsupervised data mining tool used to gain insight from unlabeled data by grouping objects based on some similarity measure [6, 11]. The most common methods for unsupervised clustering include k-means, Gaussian mixture models, and hierarchical clustering [18], all of which are workhorse algorithms for the data science industry.

artificial intelligence, machine learning, time step, (16 more...)

2211.05337

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Illinois (0.04)
Europe > Portugal > Porto > Porto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceNov-8-2022, 07:16:16 GMT

How Hierarchical Clustering works part1

Abstract: The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure -- unless the clusters are well-separated. To overcome its limitations, we propose a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini- or Bonferroni-index) of the cluster sizes does not drastically increase above a given threshold. The presented benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage's speed.

algorithm, graph, hierarchical clustering work part1, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Merchant, Arpit, Mathioudakis, Michael, Wang, Yanhao

Graph Summarization via Node Grouping: A Spectral Algorithm

Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.

data mining, eigenvector, machine learning, (15 more...)

doi: 10.1145/3539597.3570441

2211.04169

Country:

North America > United States > New York > New York County > New York City (0.06)
Europe > Finland > Uusimaa > Helsinki (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Korir, Evans Kiptoo, Vizi, Zsolt

Clustering of countries based on the associated social contact patterns in epidemiological modelling

Mathematical models have been used to understand the spread patterns of infectious diseases such as Coronavirus Disease 2019 (COVID-19). The transmission component of the models can be modelled in an age-dependent manner via introducing contact matrix for the population, which describes the contact rates between the age groups. Since social contact patterns vary from country to country, we can compare and group the countries using the corresponding contact matrices. In this paper, we present a framework for clustering countries based on their contact matrices with respect to an underlying epidemic model. Since the pipeline is generic and modular, we demonstrate its application in a COVID-19 model from R\"ost et. al. which gives a hint about which countries can be compared in a pandemic situation, when only non-pharmaceutical interventions are available.

artificial intelligence, data mining, machine learning, (18 more...)

2211.06426

Country:

Europe > Belgium (0.06)
Europe > Netherlands (0.06)
Europe > Hungary (0.06)
(44 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

An Incremental Phase Mapping Approach for X-ray Diffraction Patterns using Binary Peak Representations

Jha, Dipendra, Narayanachari, K. V. L. V., Zhang, Ruifeng, Liao, Justin, Keane, Denis T., Liao, Wei-keng, Choudhary, Alok, Chung, Yip-Wah, Bedzyk, Michael, Agrawal, Ankit

Despite the huge advancement in knowledge discovery and data mining techniques, the X-ray diffraction (XRD) analysis process has mostly remained untouched and still involves manual investigation, comparison, and verification. Due to the large volume of XRD samples from high-throughput XRD experiments, it has become impossible for domain scientists to process them manually. Recently, they have started leveraging standard clustering techniques, to reduce the XRD pattern representations requiring manual efforts for labeling and verification. Nevertheless, these standard clustering techniques do not handle problem-specific aspects such as peak shifting, adjacent peaks, background noise, and mixed phases; hence, resulting in incorrect composition-phase diagrams that complicate further steps. Here, we leverage data mining techniques along with domain expertise to handle these issues. In this paper, we introduce an incremental phase mapping approach based on binary peak representations using a new threshold based fuzzy dissimilarity measure. The proposed approach first applies an incremental phase computation algorithm on discrete binary peak representation of XRD samples, followed by hierarchical clustering or manual merging of similar pure phases to obtain the final composition-phase diagram. We evaluate our method on the composition space of two ternary alloy systems- Co-Ni-Ta and Co-Ti-Ta. Our results are verified by domain scientists and closely resembles the manually computed ground-truth composition-phase diagrams. The proposed approach takes us closer towards achieving the goal of complete end-to-end automated XRD analysis.

artificial intelligence, machine learning, pure phase, (15 more...)

2211.04011

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Minimalist Data Wrangling with Python

Gagolewski, Marek

Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results. This textbook is a non-profit project. Its online and PDF versions are freely available at https://datawranglingpy.gagolewski.com/.

machine learning, pattern recognition, programming language, (25 more...)

doi: 10.5281/zenodo.6451068

2211.0463

Country:

Africa > Kenya (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
Africa > Ethiopia (0.04)
(55 more...)

Genre:

Summary/Review (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.67)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology (1.00)
(7 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Ortiz-Bouza, Meiby, Aviyente, Selin

Optimal Graph Filters for Clustering Attributed Graphs

The second class of methods uses graph embedding, Many real-world systems can be represented as graphs where the in particular graph autoencoders. Graph convolutional network different entities are presented by nodes and their interactions by (GCN) based methods such as Graph autoencoders (GAE) [13], variational edges. An important task in studying large datasets is graph clustering. GAE (VGAE) [14], adversarially regularized graph autoencoder While there has been a lot of work on graph clustering using the (ARGA), adversarially regularized variational graph autoencoder connectivity between the nodes, many real-world networks also have (ARVGA) [15] and marginalized graph autoencoder for graph node attributes. Clustering attributed graphs requires joint modeling clustering (MGAE) [16] have demonstrated state-of-the-art performance of graph structure and node attributes. Recent work has focused on on several attributed graph clustering tasks. These methods graph convolutional networks and graph convolutional filters to combine directly use GCN as a feature extractor, where each convolutional structural and content information. However, these methods are layer is equivalent to a first-order low-pass graph filter that only takes mostly limited to lowpass filtering and do not explicitly optimize the the most immediate neighbors of each node into account.

artificial intelligence, machine learning, node, (18 more...)

2211.04634

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)