AITopics

2206.08091

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Oceania > Australia (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Hadikhani, Parham, Lai, Daphne Teck Ching, Ong, Wee-Hong

A Novel Skeleton-Based Human Activity Discovery Using Particle Swarm Optimization with Gaussian Mutation

Human activity discovery aims to cluster the activities performed by humans without any prior information on what defines each activity. Most methods presented in human activity recognition are supervised, where there are labeled inputs to train the system. In reality, it is difficult to label activities data because of its huge volume and the variety of human activities. This paper proposes an unsupervised framework to perform human activity discovery in 3D skeleton sequences. First, an approach for data pre-processing is presented. In this stage, important frames are selected based on kinetic energy. Next, the displacement of joints, statistical displacements, angles, and orientation features are extracted to represent the activities information. Since not all extracted features have useful information, the dimension of features is reduced using PCA. Most methods proposed for human activity discovery are not fully unsupervised. They use pre-segmented videos before categorizing activities. To deal with this, we have used a sliding time window to segment the time series of activities with some overlapping. Then, activities are discovered by our proposed Hybrid Particle swarm optimization (PSO) with Gaussian Mutation and K-means (HPGMK) algorithm to provide diverse solutions. PSO is used due to its straightforward idea and powerful global search capability which can identify the ideal solution in a few iterations. Finally, k-means is applied to the outcome centroids from each iteration of the PSO to overcome the slow convergence rate of PSO. The experiment results on five datasets show that the proposed framework has superior performance in discovering activities compared to the other state-of-the-art methods and has increased accuracy of at least 4% on average.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2201.05314

Country:

Asia > Brunei (0.14)
Asia > Vietnam > Long An Province > Tân An (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > Promising Solution (0.87)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Bagged $k$-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets

Hang, Hanyuan

In this paper, we propose an ensemble learning algorithm named \textit{bagged $k$-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a new measurement called the \textit{probability of localized level sets} (\textit{PLLS}), which enables us to find all clusters for varying densities with a global threshold. On the theoretical side, we show that with a properly chosen number of nearest neighbors $k_D$ in the bagged $k$-distance, the sub-sample size $s$, the bagging rounds $B$, and the number of nearest neighbors $k_L$ for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation. It turns out that with a relatively small $B$, the sub-sample size $s$ can be much smaller than the number of training data $n$ at each bagging round, and the number of nearest neighbors $k_D$ can be reduced simultaneously. Moreover, we establish optimal convergence results for the level set estimation of the PLLS in terms of Hausdorff distance, which reveals that BDMBC can find localized level sets for varying densities and thus enjoys local adaptivity. On the practical side, we conduct numerical experiments to empirically verify the effectiveness of BDMBC for mode estimation and level set estimation, which demonstrates the promising accuracy and efficiency of our proposed algorithm.

artificial intelligence, data mining, machine learning, (18 more...)

2210.09786

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Spenrath, Yorick, Hassani, Marwan, Van Dongen, Boudewijn F.

Clustering-based Aggregations for Prediction in Event Streams

Predicting the behaviour of shoppers provides valuable information for retailers, such as the expected spend of a shopper or the total turnover of a supermarket. The ability to make predictions on an individual level is useful, as it allows supermarkets to accurately perform targeted marketing. However, given the expected number of shoppers and their diverse behaviours, making accurate predictions on an individual level is difficult. This problem does not only arise in shopper behaviour, but also in various business processes, such as predicting when an invoice will be paid. In this paper we present CAPiES, a framework that focuses on this trade-off in an online setting. By making predictions on a larger number of entities at a time, we improve the predictive accuracy but at the potential cost of usefulness since we can say less about the individual entities. CAPiES is developed in an online setting, where we continuously update the prediction model and make new predictions over time. We show the existence of the trade-off in an experimental evaluation in two real-world scenarios: a supermarket with over 160 000 shoppers and a paint factory with over 171 000 invoices.

data mining, machine learning, prediction, (17 more...)

2210.09738

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Industry:

Retail (1.00)
Consumer Products & Services (0.76)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Leveraging Cluster Analysis to Understand Educational Game Player Experiences and Support Design

Swanson, Luke, Gagnon, David, Scianna, Jennifer, McCloskey, John, Spevacek, Nicholas, Slater, Stefan, Harpstead, Erik

Luke Swanson, Field Day Lab, University of Wisconsin-Madison David Gagnon, Field Day Lab, University of Wisconsin-Madison Jennifer Scianna, Field Day Lab, University of Wisconsin-Madison John McCloskey, Field Day Lab, University of Wisconsin-Madison Nicholas Spevacek, Field Day Lab, University of Wisconsin-Madison Stefan Slater, Graduate School of Education, University of Pennsylvania Erik Harpstead, Human-Computer Interaction Institute, Carnegie Mellon University Abstract: The ability for an educational game designer to understand their audience's play styles and resulting experience is an essential tool for improving their game's design. As a game is subjected to large-scale player testing, the designers require inexpensive, automated methods for categorizing patterns of player-game interactions. In this paper we present a simple, reusable process using best practices for data clustering, feasible for use within a small educational game studio. We utilize the method to analyze a real-time strategy game, processing game telemetry data to determine categories of players based on their in-game actions, the feedback they received, and their progress through the game. Introduction Playtesting is a well-adopted method for iteratively testing and improving educational games. As a game moves through development phases, members of the target audience are given versions of the game to play, and in exchange generate feedback. This feedback can then be used to validate the design decisions made during the game's development, and to direct the next iterations of work.

artificial intelligence, category, machine learning, (14 more...)

2210.09911

Country:

North America > United States > Wisconsin > Dane County > Madison (1.00)
North America > United States > Pennsylvania (0.24)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Marinoni, Andrea, Jutten, Christian, Girolami, Mark

A graph representation based on fluid diffusion model for data analysis: theoretical aspects and enhanced community detection

Representing data by means of graph structures identifies one of the most valid approach to extract information in several data analysis applications. This is especially true when multimodal datasets are investigated, as records collected by means of diverse sensing strategies are taken into account and explored. Nevertheless, classic graph signal processing is based on a model for information propagation that is configured according to heat diffusion mechanism. This system provides several constraints and assumptions on the data properties that might be not valid for multimodal data analysis, especially when large scale datasets collected from heterogeneous sources are considered, so that the accuracy and robustness of the outcomes might be severely jeopardized. In this paper, we introduce a novel model for graph definition based on fluid diffusion. The proposed approach improves the ability of graph-based data analysis to take into account several issues of modern data analysis in operational scenarios, so to provide a platform for precise, versatile, and efficient understanding of the phenomena underlying the records under exam, and to fully exploit the potential provided by the diversity of the records in obtaining a thorough characterization of the data and their significance. In this work, we focus our attention to using this fluid diffusion model to drive a community detection scheme, i.e., to divide multimodal datasets into many groups according to similarity among nodes in an unsupervised fashion. Experimental results achieved by testing real multimodal datasets in diverse application scenarios show that our method is able to strongly outperform state-of-the-art schemes for community detection in multimodal data analysis.

artificial intelligence, data mining, machine learning, (17 more...)

2112.04388

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Lawless, Connor, Gunluk, Oktay

Cluster Explanation via Polyhedral Descriptions

Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features. Traditional clustering algorithms provide limited insight into the groups they find as their main focus is accuracy and not the interpretability of the group assignments. This has spurred a recent line of work on explainable machine learning for clustering. In this paper we focus on the cluster description problem where, given a dataset and its partition into clusters, the task is to explain the clusters. We introduce a new approach to explain clusters by constructing polyhedra around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description. We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra. To deal with large datasets, we introduce a novel grouping scheme that first forms smaller groups of data points and then builds the polyhedra around the grouped data, a strategy which out-performs simply sub-sampling data. Compared to state of the art cluster description algorithms, our approach is able to achieve competitive interpretability with improved description accuracy.

artificial intelligence, cluster description, machine learning, (18 more...)

2210.08798

Country: South America > Paraguay > Asunción > Asunción (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Rahman, Zillur, Hossain, Md. Sabir, Hasan, Mohammad, Imteaj, Ahmed

An enhanced method of initial cluster center selection for K-means algorithm

Clustering is one of the widely used techniques to find out patterns from a dataset that can be applied in different applications or analyses. K-means, the most popular and simple clustering algorithm, might get trapped into local minima if not properly initialized and the initialization of this algorithm is done randomly. In this paper, we propose a novel approach to improve initial cluster selection for K-means algorithm. This algorithm is based on the fact that the initial centroids must be well separated from each other since the final clusters are separated groups in feature space. The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers. To ensure the selection of one center per cluster, we use the nearest neighbor technique. To check the robustness of our proposed algorithm, we consider several real-world datasets. We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively which proves better performance than other existing systems. The results indicate that our proposed method outperforms the conventional K means approach by accelerating the computation when the number of clusters is greater than 2.

artificial intelligence, centroid, machine learning, (13 more...)

doi: 10.1109/ASYU52992.2021.9599017

2210.09507

Country:

Asia > Bangladesh (0.05)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

AlMahamid, Fadi, Grolinger, Katarina

Agglomerative Hierarchical Clustering with Dynamic Time Warping for Household Load Curve Clustering

Energy companies often implement various demand response (DR) programs to better match electricity demand and supply by offering the consumers incentives to reduce their demand during critical periods. Classifying clients according to their consumption patterns enables targeting specific groups of consumers for DR. Traditional clustering algorithms use standard distance measurement to find the distance between two points. The results produced by clustering algorithms such as K-means, K-medoids, and Gaussian Mixture Models depend on the clustering parameters or initial clusters. In contrast, our methodology uses a shape-based approach that combines Agglomerative Hierarchical Clustering (AHC) with Dynamic Time Warping (DTW) to classify residential households' daily load curves based on their consumption patterns. While DTW seeks the optimal alignment between two load curves, AHC provides a realistic initial clusters center. In this paper, we compare the results with other clustering algorithms such as K-means, K-medoids, and GMM using different distance measures, and we show that AHC using DTW outperformed other clustering algorithms and needed fewer clusters.

artificial intelligence, clustering, machine learning, (15 more...)

doi: 10.1109/CCECE49351.2022.9918481

2210.09523

Country:

North America > Canada > Ontario > Middlesex County > London (0.04)
North America > United States (0.04)
Asia > China (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Sachdeva, Ragav, Cordeiro, Filipe R, Belagiannis, Vasileios, Reid, Ian, Carneiro, Gustavo

ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

arXiv.org Artificial IntelligenceOct-16-2022

We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster the training images based on their appearance and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. We present a theoretical result that shows the correctness and convergence of ScanMix, and an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100 (with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In all benchmarks with severe label noise, our results are competitive to the current SOTA.

artificial intelligence, machine learning, scanmix, (16 more...)

2103.11395

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
South America > Brazil > Pernambuco (0.04)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)