Goto

Collaborating Authors

Clustering: Instructional Materials


Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention

arXiv.org Artificial Intelligence

Existing work (Ji and Grishman, we consider the attention weight between 2008; McClosky et al., 2011; Li et al., 2013; two event mentions as a learned similarity, and we Chen et al., 2015; Du and Cardie, 2020; Li et al., ensure that the attention mechanism learns to align 2021a) traditionally uses a predefined list of event similar events using a semi-supervised contrastive types and their respective annotations to learn an loss. By doing this, we are able to leverage the event extraction model. However, these annotations large variety of semantic information in pretrained are both expensive and time-consuming to language models for clustering unseen types by using create. This problem is amplified when considering a trained attention head. Unlike (Huang and specialization-intensive domains such as scientific Ji, 2020), we are able to separate clustering from literature, which requires years of specialized experience learning, allowing specific task-suited clustering to understand even a specific niche. For algorithms to be selected.


Improving performance of aircraft detection in satellite imagery while limiting the labelling effort: Hybrid active learning

arXiv.org Artificial Intelligence

The earth observation industry provides satellite imagery with high spatial resolution and short revisit time. To allow efficient operational employment of these images, automating certain tasks has become necessary. In the defense domain, aircraft detection on satellite imagery is a valuable tool for analysts. Obtaining high performance detectors on such a task can only be achieved by leveraging deep learning and thus us-ing a large amount of labeled data. To obtain labels of a high enough quality, the knowledge of military experts is needed.We propose a hybrid clustering active learning method to select the most relevant data to label, thus limiting the amount of data required and further improving the performances. It combines diversity- and uncertainty-based active learning selection methods. For aircraft detection by segmentation, we show that this method can provide better or competitive results compared to other active learning methods.


Graph Coloring with Physics-Inspired Graph Neural Networks

arXiv.org Artificial Intelligence

We show how graph neural networks can be used to solve the canonical graph coloring problem. We frame graph coloring as a multi-class node classification problem and utilize an unsupervised training strategy based on the statistical physics Potts model. Generalizations to other multi-class problems such as community detection, data clustering, and the minimum clique cover problem are straightforward. We provide numerical benchmark results and illustrate our approach with an end-to-end application for a real-world scheduling use case within a comprehensive encode-process-decode framework. Our optimization approach performs on par or outperforms existing solvers, with the ability to scale to problems with millions of variables.


Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions

arXiv.org Machine Learning

We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity and effectively identifies the clusters also in presence of data contamination. We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data-driven fashion. Our proposal outperforms state-of-the-art robust, model-based methods in our numerical simulations and real-world applications related to color quantization for image analysis, human mobility patterns based on GPS data, biomedical images of diabetic retinopathy, and functional data across weather stations.


Deep Graph Clustering via Dual Correlation Reduction

arXiv.org Artificial Intelligence

Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different groups, has attracted intensive attention in recent years. However, we observe that, in the process of node encoding, existing methods suffer from representation collapse which tends to map all data into the same representation. Consequently, the discriminative capability of the node representation is limited, leading to unsatisfied clustering performance. To address this issue, we propose a novel self-supervised deep graph clustering method termed Dual Correlation Reduction Network (DCRN) by reducing information correlation in a dual manner. Specifically, in our method, we first design a siamese network to encode samples. Then by forcing the cross-view sample correlation matrix and cross-view feature correlation matrix to approximate two identity matrices, respectively, we reduce the information correlation in the dual-level, thus improving the discriminative capability of the resulting features. Moreover, in order to alleviate representation collapse caused by over-smoothing in GCN, we introduce a propagation regularization term to enable the network to gain long-distance information with the shallow network structure. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of the proposed DCRN against the existing state-of-the-art methods.


Robust Convergence in Federated Learning through Label-wise Clustering

arXiv.org Artificial Intelligence

Non-IID dataset and heterogeneous environment of the local clients are regarded as a major issue in Federated Learning (FL), causing a downturn in the convergence without achieving satisfactory performance. In this paper, we propose a novel Label-wise clustering algorithm that guarantees the trainability among geographically dispersed heterogeneous local clients, by selecting only the local models trained with a dataset that approximates into uniformly distributed class labels, which is likely to obtain faster minimization of the loss and increment the accuracy among the FL network. Through conducting experiments on the suggested six common non-IID scenarios, we empirically show that the vanilla FL aggregation model is incapable of gaining robust convergence generating biased pre-trained local models and drifting the local weights to mislead the trainability in the worst case. Moreover, we quantitatively estimate the expected performance of the local models before training, which offers a global server to select the optimal clients, saving additional computational costs. Ultimately, in order to gain resolution of the non-convergence in such non-IID situations, we design clustering algorithms based on local input class labels, accommodating the diversity and assorting clients that could lead the overall system to attain the swift convergence as global training continues. Our paper shows that proposed Label-wise clustering demonstrates prompt and robust convergence compared to other FL algorithms when local training datasets are non-IID or coexist with IID through multiple experiments.


Graph-based Ensemble Machine Learning for Student Performance Prediction

arXiv.org Artificial Intelligence

Student performance prediction is a critical research problem to understand the students' needs, present proper learning opportunities/resources, and develop the teaching quality. However, traditional machine learning methods fail to produce stable and accurate prediction results. In this paper, we propose a graph-based ensemble machine learning method that aims to improve the stability of single machine learning methods via the consensus of multiple methods. To be specific, we leverage both supervised prediction methods and unsupervised clustering methods, build an iterative approach that propagates in a bipartite graph as well as converges to more stable and accurate prediction results. Extensive experiments demonstrate the effectiveness of our proposed method in predicting more accurate student performance. Specifically, our model outperforms the best traditional machine learning algorithms by up to 14.8% in prediction accuracy.


A step-by-step guide for clustering images

#artificialintelligence

With unsupervised clustering, we aim to determine "natural" or "data-driven" groups in the data without using apriori knowledge about labels or categories. The challenge of using different unsupervised clustering methods is that it will result in different partitioning of the samples and thus different groupings since each method implicitly impose a structure on the data. Thus the question arises; What is a "good" clustering? Figure 2A depicts a bunch of samples in a 2-dimensional space. Intuitively we may describe it as a group of samples (aka the images) that are cluttered together. I would state that there are two clusters without using any label information.


Top 5 Machine Learning Algorithms for Data Science and ML Interviews

#artificialintelligence

Hello guys, you may know that Machine Learning and Artificial Intelligence have become more and more important in this increasingly digital world. They are now providing a competitive edge to businesses like NetFlix's Movie recommendations. If you have just started in this field and are looking for what to learn, then I will share 5 essential Machine learning algorithms you can learn as a beginner. These necessary algorithms form the basis of most common Machine learning projects. Knowing them well will help you understand the project and model quickly and change them as per your need.


Complete Machine Learning & Data Science with Python

#artificialintelligence

Machine learning is constantly being applied to new industries. Learn Machine Learning with Hands-On Examples What is Machine Learning? Machine Learning Terminology What are Classification vs Regression? Evaluating Performance-Classification Error Metrics Evaluating Performance-Regression Error Metrics Cross Validation and Bias Variance Trade-Off Use matplotlib and seaborn for data visualizations Machine Learning with SciKit Learn Linear Regression Algorithm Logistic Regresion Algorithm K Nearest Neighbors Algorithm Decision Trees And Random Forest Algorithm Support Vector Machine Algorithm Unsupervised Learning K Means Clustering Algorithm Hierarchical Clustering Algorithm Principal Component Analysis (PCA) Recommender System Algorithm Python instructors on OAK Academy specialize in everything from software development to data analysis, and are known for their effective. Python is a general-purpose, object-oriented, high-level programming language. Python is a multi-paradigm language, which means that it supports many programming approaches. Along with procedural and functional programming styles Python is a widely used, general-purpose programming language, but it has some limitations. Because Python is an interpreted, dynamically typed language Python is a general programming language used widely across many industries and platforms. One common use of Python is scripting, which means automating tasks. Python is a popular language that is used across many industries and in many programming disciplines. DevOps engineers use Python to script website. Python has a simple syntax that makes it an excellent programming language for a beginner to learn. To learn Python on your own, you first must become familiar Machine learning describes systems that make predictions using a model trained on real-world data. Machine learning is being applied to virtually every field today. That includes medical diagnoses, facial recognition, weather forecasts, image processing.