AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Understanding KMeans Clustering for Data Science Beginners

#artificialintelligenceAug-8-2021, 15:10:58 GMT

Clustering is an unsupervised learning method whose job is to separate the population or data points into several groups, such that data points in a group are more similar to each other dissimilar to the data points of other groups. It is nothing but a collection of objects based on similarity and dissimilarity between them. KMeans clustering is an Unsupervised Machine Learning algorithm that does the clustering task. In this method, the'n' observations are grouped into'K' clusters based on the distance. The algorithm tries to minimize the within-cluster variance(so that similar observations fall in the same cluster).

clustering, data science beginner, kmean clustering, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Add feedback

Kinematics clustering enables head impact subtyping for better traumatic brain injury prediction

Zhan, Xianghao, Li, Yiheng, Liu, Yuzhe, Cecchi, Nicholas J., Gevaert, Olivier, Zeineh, Michael M., Grant, Gerald A., Camarillo, David B.

arXiv.org Artificial IntelligenceAug-7-2021

Traumatic brain injury can be caused by various types of head impacts. However, due to different kinematic characteristics, many brain injury risk estimation models are not generalizable across the variety of impacts that humans may sustain. The current definitions of head impact subtypes are based on impact sources (e.g., football, traffic accident), which may not reflect the intrinsic kinematic similarities of impacts across the impact sources. To investigate the potential new definitions of impact subtypes based on kinematics, 3,161 head impacts from various sources including simulation, college football, mixed martial arts, and car racing were collected. We applied the K-means clustering to cluster the impacts on 16 standardized temporal features from head rotation kinematics. Then, we developed subtype-specific ridge regression models for cumulative strain damage (using the threshold of 15%), which significantly improved the estimation accuracy compared with the baseline method which mixed impacts from different sources and developed one model (R^2 from 0.7 to 0.9). To investigate the effect of kinematic features, we presented the top three critical features (maximum resultant angular acceleration, maximum angular acceleration along the z-axis, maximum linear acceleration along the y-axis) based on regression accuracy and used logistic regression to find the critical points for each feature that partitioned the subtypes. This study enables researchers to define head impact subtypes in a data-driven manner, which leads to more generalizable brain injury risk estimation.

artificial intelligence, head impact, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10439-022-03020-0

2108.03498

Country:

North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Clustering Large Data Sets with Incremental Estimation of Low-density Separating Hyperplanes

Hofmeyr, David P.

arXiv.org Machine LearningAug-7-2021

An efficient method for obtaining low-density hyperplane separators in the unsupervised context is proposed. Low density separators can be used to obtain a partition of a set of data based on their allocations to the different sides of the separators. The proposed method is based on applying stochastic gradient descent to the integrated density on the hyperplane with respect to a convolution of the underlying distribution and a smoothing kernel. In the case where the bandwidth of the smoothing kernel is decreased towards zero, the bias of these updates with respect to the true underlying density tends to zero, and convergence to a minimiser of the density on the hyperplane can be obtained. A post-processing of the partition induced by a collection of low-density hyperplanes yields an efficient and accurate clustering method which is capable of automatically selecting an appropriate number of clusters. Experiments with the proposed approach show that it is highly competitive in terms of both speed and accuracy when compared with relevant benchmarks. Code to implement the proposed approach is available in the form of an R package from https: //github.com/DavidHofmeyr/iMDH.

hyperplane, kv 1, sequence, (17 more...)

arXiv.org Machine Learning

2108.03442

Country:

Europe > Austria > Vienna (0.14)
Africa > South Africa (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Nearest Neighborhood-Based Deep Clustering for Source Data-absent Unsupervised Domain Adaptation

Tang, Song, Yang, Yan, Ma, Zhiyuan, Hendrich, Norman, Zeng, Fanyu, Ge, Shuzhi Sam, Zhang, Changshui, Zhang, Jianwei

arXiv.org Artificial IntelligenceAug-3-2021

In the classic setting of unsupervised domain adaptation (UDA), the labeled source data are available in the training phase. However, in many real-world scenarios, owing to some reasons such as privacy protection and information security, the source data is inaccessible, and only a model trained on the source domain is available. This paper proposes a novel deep clustering method for this challenging task. Aiming at the dynamical clustering at feature-level, we introduce extra constraints hidden in the geometric structure between data to assist the process. Concretely, we propose a geometry-based constraint, named semantic consistency on the nearest neighborhood (SCNNH), and use it to encourage robust clustering. To reach this goal, we construct the nearest neighborhood for every target data and take it as the fundamental clustering unit by building our objective on the geometry. Also, we develop a more SCNNH-compliant structure with an additional semantic credibility constraint, named semantic hyper-nearest neighborhood (SHNNH). After that, we extend our method to this new geometry. Extensive experiments on three challenging UDA datasets indicate that our method achieves state-of-the-art results. The proposed method has significant improvement on all datasets (as we adopt SHNNH, the average accuracy increases by over 3.0% on the large-scaled dataset). Code is available at https://github.com/tntek/N2DCX.

adaptation, domain adaptation, proc, (17 more...)

arXiv.org Artificial Intelligence

2107.12585

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Electrical peak demand forecasting- A review

Dai, Shuang, Meng, Fanlin, Dai, Hongsheng, Wang, Qian, Chen, Xizhong

arXiv.org Artificial IntelligenceAug-3-2021

The power system is undergoing rapid evolution with the roll-out of advanced metering infrastructure and local energy applications (e.g. electric vehicles) as well as the increasing penetration of intermittent renewable energy at both transmission and distribution level, which characterizes the peak load demand with stronger randomness and less predictability and therefore poses a threat to the power grid security. Since storing large quantities of electricity to satisfy load demand is neither economically nor environmentally friendly, effective peak demand management strategies and reliable peak load forecast methods become essential for optimizing the power system operations. To this end, this paper provides a timely and comprehensive overview of peak load demand forecast methods in the literature. To our best knowledge, this is the first comprehensive review on such topic. In this paper we first give a precise and unified problem definition of peak load demand forecast. Second, 139 papers on peak load forecast methods were systematically reviewed where methods were classified into different stages based on the timeline. Thirdly, a comparative analysis of peak load forecast methods are summarized and different optimizing methods to improve the forecast performance are discussed. The paper ends with a comprehensive summary of the reviewed papers and a discussion of potential future research directions.

demand forecast, forecast, forecasting, (11 more...)

arXiv.org Artificial Intelligence

2108.01393

Country:

North America > United States > California (0.14)
Asia > Thailand (0.14)
Asia > Middle East > Iraq (0.14)
(31 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.69)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Renewable (1.00)
Energy > Power Industry > Utilities (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Quality (1.00)
(8 more...)

Add feedback

Predicting Popularity of Images Over 30 Days

Dutta, Amartya, Barbhuiya, Ferdous Ahmed

arXiv.org Artificial IntelligenceAug-3-2021

The current work deals with the problem of attempting to predict the popularity of images before even being uploaded. This method is specifically focused on Flickr images. Social features of each image as well as that of the user who had uploaded it, have been recorded. The dataset also includes the engagement score of each image which is the ground truth value of the views obtained by each image over a period of 30 days. The work aims to predict the popularity of images on Flickr over a period of 30 days using the social features of the user and the image, as well as the visual features of the images. The method states that the engagement sequence of an image can be said to depend on two independent quantities, namely scale and shape of an image. Once the shape and scale of an image have been predicted, combining them the predicted sequence of an image over 30 days is obtained. The current work follows a previous work done in the same direction, with certain speculations and suggestions of improvement.

engagement score, figure 4, popularity, (15 more...)

arXiv.org Artificial Intelligence

2108.01326

Genre: Research Report (0.40)

Industry: Information Technology > Services (0.56)

Technology:

Information Technology > Communications > Social Media (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

Distribution free optimality intervals for clustering

Meilă, Marina, Zhang, Hanyu

arXiv.org Machine LearningJul-30-2021

We address the problem of validating the ouput of clustering algorithms. Given data $\mathcal{D}$ and a partition $\mathcal{C}$ of these data into $K$ clusters, when can we say that the clusters obtained are correct or meaningful for the data? This paper introduces a paradigm in which a clustering $\mathcal{C}$ is considered meaningful if it is good with respect to a loss function such as the K-means distortion, and stable, i.e. the only good clustering up to small perturbations. Furthermore, we present a generic method to obtain post-inference guarantees of near-optimality and stability for a clustering $\mathcal{C}$. The method can be instantiated for a variety of clustering criteria (also called loss functions) for which convex relaxations exist. Obtaining the guarantees amounts to solving a convex optimization problem. We demonstrate the practical relevance of this method by obtaining guarantees for the K-means and the Normalized Cut clustering criteria on realistic data sets. We also prove that asymptotic instability implies finite sample instability w.h.p., allowing inferences about the population clusterability from a sample. The guarantees do not depend on any distributional assumptions, but they depend on the data set $\mathcal{D}$ admitting a stable clustering.

optimality interval, relaxation, stability, (15 more...)

arXiv.org Machine Learning

2107.14442

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Efficient Sparse Spherical k-Means for Document Clustering

Knittel, Johannes, Koch, Steffen, Ertl, Thomas

arXiv.org Artificial IntelligenceJul-30-2021

Spherical k-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient. However, the time complexity increases linearly with the number of clusters k, which limits the suitability of the algorithm for larger values of k depending on the size of the collection. Optimizations targeted at the Euclidean k-Means algorithm largely do not apply because the cosine distance is not a metric. We therefore propose an efficient indexing structure to improve the scalability of Spherical k-Means with respect to k. Our approach exploits the sparsity of the input vectors and the convergence behavior of k-Means to reduce the number of comparisons on each iteration significantly.

centroid, input vector, vector, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3469096.3474937

2108.00895

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.06)
Europe > Ireland > Munster > County Limerick > Limerick (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

Demonstrating REACT: a Real-time Educational AI-powered Classroom Tool

Kulkarni, Ajay, Gkountouna, Olga

arXiv.org Artificial IntelligenceJul-29-2021

We present a demonstration of REACT, a new Real-time Educational AI-powered Classroom Tool that employs EDM techniques for supporting the decision-making process of educators. REACT is a data-driven tool with a user-friendly graphical interface. It analyzes students' performance data and provides context-based alerts as well as recommendations to educators for course planning. Furthermore, it incorporates model-agnostic explanations for bringing explainability and interpretability in the process of decision making. This paper demonstrates a use case scenario of our proposed tool using a real-world dataset and presents the design of its architecture and user interface. This demonstration focuses on the agglomerative clustering of students based on their performance (i.e., incorrect responses and hints used) during an in-class activity. This formation of clusters of students with similar strengths and weaknesses may help educators to improve their course planning by identifying at-risk students, forming study groups, or encouraging tutoring between students of different strengths.

demonstration, student, visualization, (13 more...)

arXiv.org Artificial Intelligence

2108.07693

Country: Asia > India (0.05)

Genre:

Research Report (0.40)
Instructional Material (0.34)

Industry:

Education > Educational Technology (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.91)
Information Technology > Architecture > Real Time Systems (0.87)

Add feedback

Advanced K-Means: Controlling Groups Sizes and Selecting Features

#artificialintelligenceJul-28-2021, 05:46:13 GMT

The algorithm uses ideas from Linear Programming, in particular Network Models. Networks models are used, among other things, in logistics to optimise the flow of goods across a network of roads. We can see in the simple figure above that we have 5 nodes with directed arcs (the arrows) between them. Each node has a demand (negative) or supply (positive) value and the arcs have flow and cost values. For instance, the arc 2–4 has a flow of 4 and a cost of $2. Similarly, node 1 supplies 20 units and node 4 requires 5 units.

algorithm, dataset, group size and selecting feature, (11 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.05)
Europe > Switzerland (0.05)
Europe > Sweden (0.05)
(20 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.33)

Add feedback