AITopics

2109.05675

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

#artificialintelligenceSep-11-2021, 03:30:17 GMT

Important Clustering Algorithms in Machine Learning

Clustering is a Machine Learning method. It is an unsupervised machine learning task. In which, we draw references from datasets consisting of input data without labelled responses. With a clustering algorithm, we give the algorithm a lot of input data with no labels and let it find any groupings in the data it can. We can use a clustering algorithm to categorize each data point into a specific group.

important clustering algorithm, input data, machine learning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningSep-11-2021

On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical Similarity Graphs

Ahn, Junhyung, Elmahdy, Adel, Mohajer, Soheil, Suh, Changho

We study the matrix completion problem that leverages hierarchical similarity graphs as side information in the context of recommender systems. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model, we characterize the exact information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) by proving sharp upper and lower bounds on the sample complexity. In the achievability proof, we demonstrate that probability of error of the maximum likelihood estimator vanishes for sufficiently large number of users and items, if all sufficient conditions are satisfied. On the other hand, the converse (impossibility) proof is based on the genie-aided maximum likelihood estimator. Under each necessary condition, we present examples of a genie-aided estimator to prove that the probability of error does not vanish for sufficiently large number of users and items. One important consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. More specifically, we analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph. Finally, we present simulation results to corroborate our theoretical findings and show that the characterized information-theoretic limit can be asymptotically achieved. N recent years, personalized recommender systems have emerged in an extensive range of Web applications to predict the preferences of its users and provide them with new and relevant items based on the scarce data about the users and/or items [2]. There are two major paradigms of recommender systems: (i) content-based filtering systems; (ii) collaborative filtering systems. Content-based filtering approach exploits a profile of users' preferences and/or properties of the items to carry out the recommendation task.

matrix, rating matrix, rating vector, (14 more...)

2109.05408

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.27)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.81)

Industry: Media (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)

Yang, Zhenwei, Bagheri, Ayoub, van der Heijden, P. G. M

Neural Networks for Latent Budget Analysis of Compositional Data

arXiv.org Machine LearningSep-10-2021

Compositional data are non-negative data collected in a rectangular matrix with a constant row sum. Due to the non-negativity the focus is on conditional proportions that add up to 1 for each row. A row of conditional proportions is called an observed budget. Latent budget analysis (LBA) assumes a mixture of latent budgets that explains the observed budgets. LBA is usually fitted to a contingency table, where the rows are levels of one or more explanatory variables and the columns the levels of a response variable. In prospective studies, there is only knowledge about the explanatory variables of individuals and interest goes out to predicting the response variable. Thus, a form of LBA is needed that has the functionality of prediction. Previous studies proposed a constrained neural network (NN) extension of LBA that was hampered by an unsatisfying prediction ability. Here we propose LBA-NN, a feed forward NN model that yields a similar interpretation to LBA but equips LBA with a better ability of prediction. A stable and plausible interpretation of LBA-NN is obtained through the use of importance plots and table, that show the relative importance of all explanatory variables on the response variable. An LBA-NN-K- means approach that applies K-means clustering on the importance table is used to produce K clusters that are comparable to K latent budgets in LBA. Here we provide different experiments where LBA-NN is implemented and compared with LBA. In our analysis, LBA-NN outperforms LBA in prediction in terms of accuracy, specificity, recall and mean square error. We provide open-source software at GitHub.

category, lba-nn, response variable, (14 more...)

2109.04875

Country:

Oceania (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

#artificialintelligenceSep-8-2021, 17:07:10 GMT

K-means Clustering and its use-case in the Security Domain

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. Unsupervised Learning is a machine learning technique in which, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data. Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.

algorithm, k-means clustering, unsupervised learning, (8 more...)

Industry: Information Technology > Security & Privacy (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Hafshejani, Sajad Fathi, Moaberfard, Zahra

Initialization for Nonnegative Matrix Factorization: a Comprehensive Review

arXiv.org Artificial IntelligenceSep-8-2021

Non-negative matrix factorization (NMF) has become a popular method for representing meaningful data by extracting a non-negative basis feature from an observed non-negative data matrix. Some of the unique features of this method in identifying hidden data put this method amongst the powerful methods in the machine learning area. The NMF is a known non-convex optimization problem and the initial point has a significant effect on finding an efficient local solution. In this paper, we investigate the most popular initialization procedures proposed for NMF so far. We describe each method and present some of their advantages and disadvantages. Finally, some numerical results to illustrate the performance of each algorithm are presented.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s41060-022-00370-9

2109.03874

Country:

Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Arizona > Maricopa County > Mesa (0.04)
Asia > Middle East > Lebanon (0.04)

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

López-Oriona, Ángel, Vilar, José A., Pierpaolo-D'Urso, null

Quantile-based fuzzy clustering of multivariate time series in the frequency domain

arXiv.org Machine LearningSep-8-2021

A novel procedure to perform fuzzy clustering of multivariate time series generated from different dependence models is proposed. Different amounts of dissimilarity between the generating models or changes on the dynamic behaviours over time are some arguments justifying a fuzzy approach, where each series is associated to all the clusters with specific membership levels. Our procedure considers quantile-based cross-spectral features and consists of three stages: (i) each element is characterized by a vector of proper estimates of the quantile cross-spectral densities, (ii) principal component analysis is carried out to capture the main differences reducing the effects of the noise, and (iii) the squared Euclidean distance between the first retained principal components is used to perform clustering through the standard fuzzy C-means and fuzzy C-medoids algorithms. The performance of the proposed approach is evaluated in a broad simulation study where several types of generating processes are considered, including linear, nonlinear and dynamic conditional correlation models. Assessment is done in two different ways: by directly measuring the quality of the resulting fuzzy partition and by taking into account the ability of the technique to determine the overlapping nature of series located equidistant from well-defined clusters. The procedure is compared with the few alternatives suggested in the literature, substantially outperforming all of them whatever the underlying process and the evaluation scheme. Two specific applications involving air quality and financial databases illustrate the usefulness of our approach.

algorithm, procedure, time sery, (14 more...)

2109.03728

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.45)
Research Report > Promising Solution (0.34)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Banking & Finance > Trading (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Mondal, Anindya, Das, Mayukhmali

Moving Object Detection for Event-based Vision using k-means Clustering

arXiv.org Artificial IntelligenceSep-4-2021

Event-based cameras are bio-inspired sensors that mimic the working of the human eye (Gallego et al. [2020]). While frame-based cameras capture images at a definite frame rate which is determined by an external clock, each pixel in event-based cameras memorizes the log intensity each time an event is sent and simultaneously monitors for a sufficient change in magnitude from this memorized threshold value (Gallego et al. [2020]). The event is recorded by the camera and is transmitted by the sensor in the form of its location {x, y}, its time of occurrence (timestamp) t and its polarity p (taking a binary value 1 or 1, representing whether the pixel is brighter or darker) (Chen et al. [2020]). The working of an event-based camera is shown in Figure 1. The sensors used in event-based cameras are data-driven, for their output depends on the amount of motion or brightness change in the scene (Gallego et al. [2020]). Higher is the motion, higher is the number of events generated. The events are recorded in microsecond resolution and are transmitted in sub-millisecond latency, making these sensors react quickly to visual stimuli (Gallego et al. [2020]). Thus, while frame-based cameras capture the absolute brightness of a scene, event-based cameras capture the per-pixel brightness asynchronously, making traditional computer vision algorithms inapplicable to be implemented for processing the event data. Detection of moving objects is an important task in automation, where a computer differentiates in between a moving object and a stationary one.

event-based data, event-based vision, object detection, (15 more...)

arXiv.org Artificial Intelligence

2109.01879

Country:

Asia > India > West Bengal > Kolkata (0.04)
North America > Canada > Ontario > Essex County > Windsor (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceSep-3-2021, 23:42:27 GMT

Supervised vs Unsupervised Learning, Explained

In this article, I'll explain supervised vs unsupervised learning. The tutorial will start by discussing some foundational concepts and then it will explain supervised and unsupervised learning separately, in more detail. If you need something specific, just click on the link. The following links will take you to specific sections of the article. Having said that, if you're confused about supervised vs unsupervised learning, you'll probably want to read the whole article from start to finish. If you're somewhat new to machine learning, you've probably heard the terms "supervised" and "unsupervised" learning.

learning, supervised learning, unsupervised learning, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.98)

#artificialintelligenceSep-3-2021, 03:56:09 GMT

Hierarchical Clustering in Machine Learning

Hierarchical Clustering is a part of unsupervised Machine Learning. Hierarchical clustering is also known as Hierarchical Cluster Analysis (HCA) is unsupervised Machine Learning. It groups unlabeled data sets into groups also Known as clusters. They look quite similar to K-means Clustering but it's different from it as here we don't decide the number of clusters as we do it in K-means Clustering. As here we don't face the challenges that we face in K-mean clustering due to predetermined clusters.

clustering, hierarchical clustering, machine learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)