AITopics

1908.04771

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > China > Jiangsu Province (0.04)

Genre: Research Report (1.00)

Industry: Water & Waste Management > Water Management > Lifecycle > Treatment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Vaibhav, null, Huang, Po-Yao, Frederking, Robert

RWR-GAE: Random Walk Regularization for Graph Auto Encoders

arXiv.org Machine LearningAug-12-2019

Node embeddings have become an ubiquitous technique for representing graph data in a low dimensional space. Graph autoencoders, as one of the widely adapted deep models, have been proposed to learn graph embeddings in an unsupervised way by minimizing the reconstruction error for the graph data. However, its reconstruction loss ignores the distribution of the latent representation, and thus leading to inferior embeddings. To mitigate this problem, we propose a random walk based method to regularize the representations learnt by the encoder. We show that the proposed novel enhancement beats the existing state-of-the-art models by a large margin (upto 7.5\%) for node clustering task, and achieves state-of-the-art accuracy on the link prediction task for three standard datasets, cora, citeseer and pubmed. Code available at https://github.com/MysteryVaibhav/DW-GAE.

data mining, machine learning, node, (18 more...)

1908.04003

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceAug-12-2019

Multi-view Clustering with the Cooperation of Visible and Hidden Views

Deng, Zhaohong, Liu, Ruixiu, Zhang, Te, Xu, Peng, Choi, Kup-Sze, Qin, Bin, Wang, Shitong

Multi-view data are becoming common in real-world modeling tasks and many multi-view data clustering algorithms have thus been proposed. The existing algorithms usually focus on the cooperation of different views in the original space but neglect the influence of the hidden information among these different visible views, or they only consider the hidden information between the views. The algorithms are therefore not efficient since the available information is not fully excavated, particularly the otherness information in different views and the consistency information between them. In practice, the otherness and consistency information in multi-view data are both very useful for effective clustering analyses. In this study, a Multi-View clustering algorithm developed with the Cooperation of Visible and Hidden views, i.e., MV-Co-VH, is proposed. The MV-Co-VH algorithm first projects the multiple views from different visible spaces to the common hidden space by using the non-negative matrix factorization (NMF) strategy to obtain the common hidden view data. Collaborative learning is then implemented in the clustering procedure based on the visible views and the shared hidden view. The results of extensive experiments on UCI multi-view datasets and real-world image multi-view datasets show that the clustering performance of the proposed algorithm is competitive with or even better than that of the existing algorithms.

artificial intelligence, data mining, machine learning, (18 more...)

1908.04766

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Water & Waste Management > Water Management (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceAug-12-2019

On Defending Against Label Flipping Attacks on Malware Detection Systems

Taheri, Rahim, Javidan, Reza, Shojafar, Mohammad, Pooranian, Zahra, Miri, Ali, Conti, Mauro

Label manipulation attacks are a subclass of data poisoning attacks in adversarial machine learning used against different applications, such as malware detection. These types of attacks represent a serious threat to detection systems in environments having high noise rate or uncertainty, such as complex networks and Internet of Thing (IoT). Recent work in the literature has suggested using the $K$-Nearest Neighboring (KNN) algorithm to defend against such attacks. However, such an approach can suffer from low to wrong detection accuracy. In this paper, we design an architecture to tackle the Android malware detection problem in IoT systems. We develop an attack mechanism based on Silhouette clustering method, modified for mobile Android platforms. We proposed two Convolutional Neural Network (CNN)-type deep learning algorithms against this \emph{Silhouette Clustering-based Label Flipping Attack (SCLFA)}. We show the effectiveness of these two defense algorithms - \emph{Label-based Semi-supervised Defense (LSD)} and \emph{clustering-based Semi-supervised Defense (CSD)} - in correcting labels being attacked. We evaluate the performance of the proposed algorithms by varying the various machine learning parameters on three Android datasets: Drebin, Contagio, and Genome and three types of features: API, intent, and permission. Our evaluation shows that using random forest feature selection and varying ratios of features can result in an improvement of up to 19\% accuracy when compared with the state-of-the-art method in the literature.

algorithm, artificial intelligence, machine learning, (18 more...)

1908.04473

Country:

Europe (1.00)
North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningAug-10-2019

A Critical Note on the Evaluation of Clustering Algorithms

Zhong, Li, Zhang, Tiantian, Yuan, Bo

Experimental evaluation is a major research methodology for investigating clustering algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their quality plays an important role on the value of the research work. However, in most of the existing studies, little attention has been paid to the specific properties of the datasets and they are often regarded as black-box problems. In our work, with the help of advanced visualization and dimension reduction techniques, we show that there are potential issues with some of the popular benchmark datasets used to evaluate clustering algorithms that may seriously compromise the research quality and even may produce completely misleading results. We suggest that significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms by having a principled analysis of each benchmark dataset of interest.

artificial intelligence, data mining, machine learning, (19 more...)

1908.03782

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningAug-9-2019

Unexpected Effects of Online K-means Clustering

Moshkovitz, Michal

In this paper we study k-means clustering in the online setting. In the offline setting the main parameters are number of centers, k, and size of the dataset, n. Performance guarantees are given as a function of these parameters. In the online setting new factors come into place: the ordering of the dataset and whether n is known in advance or not. One of the main results of this paper is the discovery that these new factors have dramatic effects on the quality of the clustering algorithms. For example, for constant k: (1) $\Omega(n)$ centers are needed if the order is arbitrary, (2) if the order is random and n is unknown in advance, the number of centers reduces to $\Theta(logn)$, and (3) if n is known, then the number of centers reduces to a constant. For different values of the new factors, we show upper and lower bounds that are exactly the same up to a constant, thus achieving optimal bounds.

algorithm, artificial intelligence, machine learning, (18 more...)

1908.06818

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningAug-9-2019

Deep Kernel Learning for Clustering

Wu, Chieh, Khan, Zulqarnain, Chang, Yale, Ioannidis, Stratis, Dy, Jennifer

We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by--and are at least as expressive as--spectral clustering. Our training objective, based on the Hilbert Schmidt Information Criterion, can be optimized via gradient adaptations on the Stiefel manifold, leading to significant acceleration over spectral methods relying on eigendecompositions. Finally, our trained embedding can be directly applied to out-of-sample data. We show experimentally that our approach outperforms several state-of-the-art deep clustering methods, as well as traditional approaches such as $k$-means and spectral clustering over a broad array of real-life and synthetic datasets.

artificial intelligence, machine learning, optimization problem, (17 more...)

1908.03515

Country: North America > United States (0.29)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Mosavi, Amir, Ozturk, Pinar, Chau, Kwok-wing

Flood Prediction Using Machine Learning Models: Literature Review

arXiv.org Machine LearningAug-7-2019

Floods are among the most destructive natural disasters, which are highly complex to model. The research on the advancement of flood prediction models contributed to risk reduction, policy suggestion, minimization of the loss of human life, and reduction the property damage associated with floods. To mimic the complex mathematical expressions of physical processes of floods, during the past two decades, machine learning (ML) methods contributed highly in the advancement of prediction systems providing better performance and cost-effective solutions. Due to the vast benefits and potential of ML, its popularity dramatically increased among hydrologists. Researchers through introducing novel ML methods and hybridizing of the existing ones aim at discovering more accurate and efficient prediction models. The main contribution of this paper is to demonstrate the state of the art of ML models in flood prediction and to give insight into the most suitable models. In this paper, the literature where ML models were benchmarked through a qualitative analysis of robustness, accuracy, effectiveness, and speed are particularly investigated to provide an extensive overview on the various ML algorithms used in the field. The performance comparison of ML models presents an in-depth understanding of the different techniques within the framework of a comprehensive evaluation and discussion. As a result, this paper introduces the most promising prediction methods for both long-term and short-term floods. Furthermore, the major trends in improving the quality of the flood prediction models are investigated. Among them, hybridization, data decomposition, algorithm ensemble, and model optimization are reported as the most effective strategies for the improvement of ML methods.

neural network, prediction, upstream oil & gas, (22 more...)

doi: 10.3390/w10111536

1908.02781

Country:

Oceania > Australia (0.28)
North America > Canada (0.28)
Asia > China (0.14)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
(4 more...)

Yelibi, Lionel, Gebbie, Tim

Agglomerative Fast Super-Paramagnetic Clustering

arXiv.org Machine LearningAug-7-2019

Concretely, that the proposed algorithm does in fact recover the correct super-paramagnetic cluster configurations that are near the entropy maxima. Previous cases studies include data clustering of stocks [15] and gene data in [4], temporal states of financial markets [8], and state-detection for adaptive machine learning in trading [5]. There is an endless variety of potential use-cases for this type of fast big-data clustering technology. Building on prior work we propose and demonstrate an alternative to fast Super-Paramagnetic Clustering (f-SPC) [15] using a modern and streamlined implementation of the "Merging Algorithm" first suggested by Gi-ada [4], one that can recover the same cluster configurations for a variety of test-cases, but with significantly reduced compute times. We again use the Noh Ansatz [11] and the Maximum Likelihood Estimation approach introduced by Giada and Marsili [4]. We call the new algorithm Agglomerative Super-Paramagnetic Clustering (ASPC) and it has the benefit of being less computationally expensive than the PGAs implemented in [5, 6, 15].

algorithm, configuration, correlation matrix, (13 more...)

1908.00951

Country: Africa > South Africa > Western Cape > Cape Town (0.04)

Genre: Research Report (0.40)

Industry: Banking & Finance (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

arXiv.org Artificial IntelligenceAug-7-2019

Transferring knowledge from monitored to unmonitored areas for forecasting parking spaces

Ionita, Andrei, Pomp, André, Cochez, Michael, Meisen, Tobias, Decker, Stefan

Smart cities around the world have begun monitoring parking areas in order to estimate available parking spots and help drivers looking for parking. The current results are promising, indeed. However, existing approaches are limited by the high cost of sensors that need to be installed throughout the city in order to achieve an accurate estimation. This work investigates the extension of estimating parking information from areas equipped with sensors to areas where they are missing. To this end, the similarity between city neighborhoods is determined based on background data, i.e., from geographic information systems. Using the derived similarity values, we analyze the adaptation of occupancy rates from monitored- to unmonitored parking areas.

artificial intelligence, data mining, machine learning, (18 more...)

1908.03629

Country:

Europe > Germany (0.68)
North America > United States > California (0.29)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)