AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Mining Human Mobility Data to Discover Locations and Habits

Andrade, Thiago, Cancela, Brais, Gama, João

arXiv.org Machine LearningSep-25-2019

Many aspects of life are associated with places of human mobility patterns and nowadays we are facing an increase in the pervasiveness of mobile devices these individuals carry. Positioning technologies that serve these devices such as the cellular antenna (GSM networks), global navigation satellite systems (GPS), and more recently the WiFi positioning system (WPS) provide large amounts of spatio-temporal data in a continuous way. Therefore, detecting significant places and the frequency of movements between them is fundamental to understand human behavior. In this paper, we propose a method for discovering user habits without any a priori or external knowledge by introducing a density-based clustering for spatio-temporal data to identify meaningful places and by applying a Gaussian Mixture Model (GMM) over the set of meaningful places to identify the representations of individual habits. To evaluate the proposed method we use two real-world datasets. One dataset contains high-density GPS data and the other one contains GSM mobile phone data in a coarse representation. The results show that the proposed method is suitable for this task as many unique habits were identified. This can be used for understanding users' behavior and to draw their characterizing profiles having a panorama of the mobility patterns from the data.

dataset, meaningful place, trajectory, (14 more...)

arXiv.org Machine Learning

1909.11406

Country:

South America > Brazil (0.05)
Europe > Portugal > Porto > Porto (0.04)
North America > United States > Hawaii (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science (0.93)

Add feedback

Determining offshore wind installation times using machine learning and open data

Tranberg, Bo, Kratmann, Kasper Koops, Stege, Jason

arXiv.org Machine LearningSep-25-2019

The installation process of offshore wind turbines requires the use of expensive jack-up vessels. These vessels regularly report their position via the Automatic Identification System (AIS). This paper introduces a novel approach of applying machine learning to AIS data from jack-up vessels. We apply the new method to 13 offshore wind farms in Danish, German and British waters. For each of the wind farms we identify individual turbine locations, individual installation times, time in transit and time in harbor for the respective vessel. This is done in an automated way exclusively using AIS data with no prior knowledge of turbine locations, thus enabling a detailed description of the entire installation process.

ais data, installation, installation time, (14 more...)

arXiv.org Machine Learning

1909.11313

Country: Europe > Netherlands > South Holland > Rotterdam (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy > Renewable > Wind (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

kjahan/clustering

#artificialintelligenceSep-23-2019, 06:13:28 GMT

This implementation programmatically optimizes for the number of clusters (k) and at the end of clustering process stores the clusters to disk. You can test the code with San Francisco crimes data in "inputs" folder (i.e. Note that if you want to test with your own location data, you need to copy your location CSV format file into "inputs" folder first. Next, pass your filename as a parameter to the clustering program as shown below. Your CSV file should have "Lat,Lon" format.

csv, kjahan, san francisco crime data

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Add feedback

No Free Lunch But A Cheaper Supper: A General Framework for Streaming Anomaly Detection

Calikus, Ece, Nowaczyk, Slawomir, Sant'Anna, Anita, Dikmen, Onur

arXiv.org Artificial IntelligenceSep-23-2019

In recent years, there has been increased research interest in detecting anomalies in temporal streaming data. A variety of algorithms have been developed in the data mining community, which can be divided into two categories (i.e., general and ad hoc). In most cases, general approaches assume the one-size-fits-all solution model where a single anomaly detector can detect all anomalies in any domain. To date, there exists no single general method that has been shown to outperform the others across different anomaly types, use cases and datasets. On the other hand, ad hoc approaches that are designed for a specific application lack flexibility. Adapting an existing algorithm is not straightforward if the specific constraints or requirements for the existing task change. In this paper, we propose SAFARI, a general framework formulated by abstracting and unifying the fundamental tasks in streaming anomaly detection, which provides a flexible and extensible anomaly detection procedure. SAFARI helps to facilitate more elaborate algorithm comparisons by allowing us to isolate the effects of shared and unique characteristics of different algorithms on detection performance. Using SAFARI, we have implemented various anomaly detectors and identified a research gap that motivates us to propose a novel learning strategy in this work. We conducted an extensive evaluation study of 20 detectors that are composed using SAFARI and compared their performances using real-world benchmark datasets with different properties. The results indicate that there is no single superior detector that works well for every case, proving our hypothesis that "there is no free lunch" in the streaming anomaly detection world. Finally, we discuss the benefits and drawbacks of each method in-depth and draw a set of conclusions to guide future users of SAFARI.

anomaly, dataset, detection, (17 more...)

arXiv.org Artificial Intelligence

1909.06927

Country:

Europe > Sweden > Halland County > Halmstad (0.04)
North America > United States > Minnesota (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

57 Best Machine Learning Course Online & Tutorial Digital Learning Land

#artificialintelligenceSep-22-2019, 02:12:47 GMT

Data visualization: In this section, you will learn how to create simple plots like scatter plot histogram bar, etc. Data manipulation: You will learn in detail about data manipulation. GUI Programming: This section is a combination of life instructor-led training and self-paced learning. Developing web Maps and representing information using plots: In this section, you will understand how to design Python applications. Computer vision using open CV and visualization using bokeh: You will also learn designing Python application in the section.

instructor, machine learning, student, (13 more...)

#artificialintelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Michigan (0.04)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.31)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection

Hämäläinen, Joonas, Alencar, Alisson S. C., Kärkkäinen, Tommi, Mattos, César L. C., Júnior, Amauri H. Souza, Gomes, João P. P.

arXiv.org Machine LearningSep-22-2019

The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated concerning a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generalization capability; furthermore, we assess several clustering-based methods in regression scenarios. Based on an extensive empirical evaluation, we conclude that the evaluated methods are both scalable and useful. Specifically, for a small number of reference points, the clustering-based methods outperformed the standard random selection of the original MLM formulation.

dataset, reference point, selection, (11 more...)

arXiv.org Machine Learning

1909.09978

Country:

Europe > Finland > Central Finland > Jyväskylä (0.04)
South America > Brazil > Ceará > Fortaleza (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Investigation of Quantum Deep Clustering Framework with Quantum Deep SVM & Convolutional Neural Network Feature Extractor

Bishwas, Arit Kumar, Mani, Ashish, Palade, Vasile

arXiv.org Artificial IntelligenceSep-21-2019

In this paper, we have proposed a deep quantum SVM formulation, and further demonstrated a quantum-clustering framework based on the quantum deep SVM formulation, deep convolutional neural networks, and quantum K-Means clustering. We have investigated the run time computational complexity of the proposed quantum deep clustering framework and compared with the possible classical implementation. Our investigation shows that the proposed quantum version of deep clustering formulation demonstrates a significant performance gain (exponential speed up gains in many sections) against the possible classical implementation. The proposed theoretical quantum deep clustering framework is also interesting & novel research towards the quantum-classical machine learning formulation to articulate the maximum performance.

formulation, quantum, svm, (16 more...)

arXiv.org Artificial Intelligence

1909.09852

Country:

Asia > India (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)

Genre: Research Report (0.50)

Industry:

Telecommunications > Networks (0.40)
Information Technology > Networks (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Application of Fuzzy Clustering for Text Data Dimensionality Reduction

Karami, Amir

arXiv.org Machine LearningSep-20-2019

Large textual corpora are often represented by the document-term frequency matrix whose elements are the frequency of terms; however, this matrix has two problems: sparsity and high dimensionality. Four dimension reduction strategies are used to address these problems. Of the four strategies, unsupervised feature transformation (UFT) is a popular and efficient strategy to map the terms to a new basis in the document-term frequency matrix. Although several UFT-based methods have been developed, fuzzy clustering has not been considered for dimensionality reduction. This research explores fuzzy clustering as a new UFT-based approach to create a lower-dimensional representation of documents. Performance of fuzzy clustering with and without using global term weighting methods is shown to exceed principal component analysis and singular value decomposition. This study also explores the effect of applying different fuzzifier values on fuzzy clustering for dimensionality reduction purpose.

classification evaluation, fuzzy clustering, knowledge engineering and data mining, (8 more...)

arXiv.org Machine Learning

1909.10881

Country:

North America > United States > South Carolina (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Maryland > Baltimore County (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Online Hierarchical Clustering Approximations

Menon, Aditya Krishna, Rajagopalan, Anand, Sumengen, Baris, Citovsky, Gui, Cao, Qin, Kumar, Sanjiv

arXiv.org Machine LearningSep-20-2019

Hierarchical clustering is a widely used approach for clustering datasets at multiple levels of granularity. Despite its popularity, existing algorithms such as hierarchical agglomerative clustering (HAC) are limited to the offline setting, and thus require the entire dataset to be available. This prohibits their use on large datasets commonly encountered in modern learning applications. In this paper, we consider hierarchical clustering in the online setting, where points arrive one at a time. We propose two algorithms that seek to optimize the Moseley and Wang (MW) revenue function, a variant of the Dasgupta cost. These algorithms offer different tradeoffs between efficiency and MW revenue performance. The first algorithm, OTD, is a highly efficient Online Top Down algorithm which provably achieves a 1/3-approximation to the MW revenue under a data separation assumption. The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice. We show that OHAC approximates offline HAC by leveraging a novel split-merge procedure. We empirically show that OTD and OHAC offer significant efficiency and cluster quality gains respectively over baselines.

algorithm, hierarchy, linkage, (16 more...)

arXiv.org Machine Learning

1909.09667

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Consensual aggregation of clusters based on Bregman divergences to improve predictive models

Fisher, Aurélie, Has, Sothea, Mougeot, Mathilde

arXiv.org Machine LearningSep-20-2019

A new procedure to construct predictive models in supervised learning problems by paying attention to the clustering structure of the input data is introduced. We are interested in situations where the input data consists of more than one unknown cluster, and where there exist different underlying models on these clusters. Thus, instead of constructing a single predictive model on the whole dataset, we propose to use a K-means clustering algorithm with different options of Bregman divergences, to recover the clustering structure of the input data. Then one dedicated predictive model is fit per cluster. For each divergence, we construct a simple local predictor on each observed cluster. We obtain one estimator, the collection of the K simple local predictors, per divergence, and we propose to combine them in a smart way based on a consensus idea. Several versions of consensual aggregation in both classification and regression problems are considered. A comparison of the performances of all constructed estimators on different simulated and real data assesses the excellent performance of our method. In a large variety of prediction problems, the consensual aggregation procedure outperforms all the other models.

bregman divergence, divergence, procedure, (17 more...)

arXiv.org Machine Learning

1909.0937

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Information Technology > Security & Privacy (0.68)
Education > Educational Setting (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback