AITopics

2102.03827

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.50)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Vicente, Serge, Murua, Alejandro

Determinantal consensus clustering

arXiv.org Machine LearningFeb-7-2021

Random restart of a given algorithm produces many partitions to yield a consensus clustering. Ensemble methods such as consensus clustering have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPP and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points within subsets. So, subsets with more similar points have less chances of being generated than subsets with very distinct points. The current and most popular sampling technique is sampling center points uniformly at random. We show through extensive simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. These two properties of DPP are key to make DPPs achieve good performance with small ensembles. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperform classical algorithms such as k-medoids and k-means consensus clusterings which are based on uniform random sampling of center points.

algorithm, configuration, dataset, (16 more...)

2102.03948

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningFeb-7-2021

A self-adaptive and robust fission clustering algorithm via heat diffusion and maximal turning angle

Han, Yu, Lu, Shizhan, Xu, Haiyan

Cluster analysis, which focuses on the grouping and categorization of similar elements, is widely used in various fields of research. A novel and fast clustering algorithm, fission clustering algorithm, is proposed in recent year. In this article, we propose a robust fission clustering (RFC) algorithm and a self-adaptive noise identification method. The RFC and the self-adaptive noise identification method are combine to propose a self-adaptive robust fission clustering (SARFC) algorithm. Several frequently-used datasets were applied to test the performance of the proposed clustering approach and to compare the results with those of other algorithms. The comprehensive comparisons indicate that the proposed method has advantages over other common methods.

algorithm, fission, robust fission, (13 more...)

2102.03794

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Galhotra, Sainyam, Saisubramanian, Sandhya, Zilberstein, Shlomo

Learning to Generate Fair Clusters from Demonstrations

arXiv.org Artificial IntelligenceFeb-7-2021

Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement, leading to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters.

constraint, demonstration, node, (15 more...)

2102.03977

Country:

North America > United States > California (0.14)
North America > United States > Texas (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningFeb-5-2021

Projection Robust Wasserstein Barycenter

Huang, Minhui, Ma, Shiqian, Lai, Lifeng

Collecting and aggregating information from several probability measures or histograms is a fundamental task in machine learning. One of the popular solution methods for this task is to compute the barycenter of the probability measures under the Wasserstein metric. However, approximating the Wasserstein barycenter is numerically challenging because of the curse of dimensionality. This paper proposes the projection robust Wasserstein barycenter (PRWB) that mitigates the curse of dimensionality. This new model projects the probability measures onto a lower-dimensional subspace that maximizes the Wasserstein barycenter objective. The resulting problem is a max-min problem over the Stiefel manifold, which is numerically challenging in practice. Combining the iterative Bregman projection algorithm and Riemannian optimization, we propose two new algorithms for computing the PRWB. The complexity of arithmetic operations of the proposed algorithms for obtaining an $\epsilon$-stationary solution is analyzed. We incorporate the PRWB into a discrete distribution clustering algorithm, and the numerical results on real text datasets confirm that our PRWB model helps improve the clustering performance significantly.

algorithm, complexity, wasserstein barycenter, (14 more...)

2102.0339

Country:

North America > United States > California > Yolo County > Davis (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Sahin, Özge, Czado, Claudia

Vine copula mixture models and clustering for non-Gaussian data

arXiv.org Machine LearningFeb-5-2021

The majority of finite mixture models suffer from not allowing asymmetric tail dependencies within components and not capturing non-elliptical clusters in clustering applications. Since vine copulas are very flexible in capturing these types of dependencies, we propose a novel vine copula mixture model for continuous data. We discuss the model selection and parameter estimation problems and further formulate a new model-based clustering algorithm. The use of vine copulas in clustering allows for a range of shapes and dependency structures for the clusters. Our simulation experiments illustrate a significant gain in clustering accuracy when notably asymmetric tail dependencies or/and non-Gaussian margins within the components exist. The analysis of real data sets accompanies the proposed method. We show that the model-based clustering algorithm with vine copula mixture models outperforms the other model-based clustering techniques, especially for the non-Gaussian multivariate data.

algorithm, copula, mixture model, (13 more...)

2102.03257

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > Canada > British Columbia (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceFeb-5-2021

Corner Case Generation and Analysis for Safety Assessment of Autonomous Vehicles

Sun, Haowei, Feng, Shuo, Yan, Xintao, Liu, Henry X.

Testing and evaluation is a crucial step in the development and deployment of Connected and Automated Vehicles (CAVs). To comprehensively evaluate the performance of CAVs, it is of necessity to test the CAVs in safety-critical scenarios, which rarely happen in naturalistic driving environment. Therefore, how to purposely and systematically generate these corner cases becomes an important problem. Most existing studies focus on generating adversarial examples for perception systems of CAVs, whereas limited efforts have been put on the decision-making systems, which is the highlight of this paper. As the CAVs need to interact with numerous background vehicles (BVs) for a long duration, variables that define the corner cases are usually high dimensional, which makes the generation a challenging problem. In this paper, a unified framework is proposed to generate corner cases for the decision-making systems. To address the challenge brought by high dimensionality, the driving environment is formulated based on Markov Decision Process, and the deep reinforcement learning techniques are applied to learn the behavior policy of BVs. With the learned policy, BVs will behave and interact with the CAVs more aggressively, resulting in more corner cases. To further analyze the generated corner cases, the techniques of feature extraction and clustering are utilized. By selecting representative cases of each cluster and outliers, the valuable corner cases can be identified from all generated corner cases. Simulation results of a highway driving environment show that the proposed methods can effectively generate and identify the valuable corner cases.

corner case, feng, vehicle, (15 more...)

2102.03483

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Petschnigg, Christina, Spitzner, Markus, Weitzendorf, Lucas, Pilz, Jürgen

From a Point Cloud to a Simulation Model: Bayesian Segmentation and Entropy based Uncertainty Estimation for 3D Modelling

arXiv.org Machine LearningFeb-4-2021

The 3D modelling of indoor environments and the generation of process simulations play an important role in factory and assembly planning. In brownfield planning cases existing data are often outdated and incomplete especially for older plants, which were mostly planned in 2D. Thus, current environment models cannot be generated directly on the basis of existing data and a holistic approach on how to build such a factory model in a highly automated fashion is mostly non-existent. Major steps in generating an environment model in a production plant include data collection and pre-processing, object identification as well as pose estimation. In this work, we elaborate a methodical workflow, which starts with the digitalization of large-scale indoor environments and ends with the generation of a static environment or simulation model. The object identification step is realized using a Bayesian neural network capable of point cloud segmentation. We elaborate how the information on network uncertainty generated by a Bayesian segmentation framework can be used in order to build up a more accurate environment model. The steps of data collection and point cloud segmentation as well as the resulting model accuracy are evaluated on a real-world data set collected at the assembly line of a large-scale automotive production plant. The segmentation network is further evaluated on the publicly available Stanford Large-Scale 3D Indoor Spaces data set. The Bayesian segmentation network clearly surpasses the performance of the frequentist baseline and allows us to increase the accuracy of the model placement in a simulation scene considerably.

accuracy, laser scan, point cloud, (14 more...)

2102.02488

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Workflow (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

#artificialintelligenceFeb-3-2021, 05:11:36 GMT

K Means Clustering

Clustering is an unsupervised machine learning technique. It is the process of division of the dataset into groups in which the members in the same group possess similarities in features. The commonly used clustering algorithms are K-Means clustering, Hierarchical clustering, Density-based clustering, Model-based clustering, etc. In this article, we are going to discuss K-Means clustering in detail. First of all, we have to import essential libraries.

algorithm, k-means, means clustering

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceFeb-3-2021

LinkLouvain: Link-Aware A/B Testing and Its Application on Online Marketing Campaign

Cai, Tianchi, Cheng, Daxi, Liang, Chen, Liu, Ziqi, Gu, Lihong, Xie, Huizhi, Zhang, Zhiqiang, Zeng, Xiaodong, Gu, Jinjie

A lot of online marketing campaigns aim to promote user interaction. The average treatment effect (ATE) of campaign strategies need to be monitored throughout the campaign. A/B testing is usually conducted for such needs, whereas the existence of user interaction can introduce interference to normal A/B testing. With the help of link prediction, we design a network A/B testing method LinkLouvain to minimize graph interference and it gives an accurate and sound estimate of the campaign's ATE. In this paper, we analyze the network A/B testing problem under a real-world online marketing campaign, describe our proposed LinkLouvain method, and evaluate it on real-world data. Our method achieves significant performance compared with others and is deployed in the online marketing campaign.

artificial intelligence, data mining, machine learning, (18 more...)

doi: 10.1007/978-3-030-73200-4_34

2102.01902

Country:

North America > United States > New York (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.32)

Industry: Marketing (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)