AITopics

1901.01498

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Sundermann, Camila V., Domingues, Marcos A., Marcacini, Ricardo M., Rezende, Solange O.

Combining Privileged Information to Improve Context-Aware Recommender Systems

arXiv.org Artificial IntelligenceJan-4-2019

A recommender system is an information filtering technology which can be used to predict preference ratings of items (products, services, movies, etc) and/or to output a ranking of items that are likely to be of interest to the user. Context-aware recommender systems (CARS) learn and predict the tastes and preferences of users by incorporating available contextual information in the recommendation process. One of the major challenges in context-aware recommender systems research is the lack of automatic methods to obtain contextual information for these systems. Considering this scenario, in this paper, we propose to use contextual information from topic hierarchies of the items (web pages) to improve the performance of context-aware recommender systems. The topic hierarchies are constructed by an extension of the LUPI-based Incremental Hierarchical Clustering method that considers three types of information: traditional bag-of-words (technical information), and the combination of named entities (privileged information I) with domain terms (privileged information II). We evaluated the contextual information in four context-aware recommender systems. Different weights were assigned to each type of information. The empirical results demonstrated that topic hierarchies with the combination of the two kinds of privileged information can provide better recommendations.

artificial intelligence, information, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1511.0229

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

arXiv.org Machine LearningJan-4-2019

MultiDEC: Multi-Modal Clustering of Image-Caption Pairs

Yang, Sean, Huang, Kuan-Hao, Howe, BIll

In this paper, we propose a method for clustering image-caption pairs by simultaneously learning image representations and text representations that are constrained to exhibit similar distributions. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce but free-text descriptions are common. MultiDEC initializes parameters with stacked autoencoders, then iteratively minimizes the Kullback-Leibler divergence between the distribution of the images (and text) to that of a combined joint target distribution. We regularize by penalizing non-uniform distributions across clusters. The representations that minimize this objective produce clusters that outperform both single-view and multi-view techniques on large benchmark image-caption datasets.

joint target distribution, multidec, target distribution, (17 more...)

1901.0186

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Machine LearningJan-3-2019

Learning Graph Embedding with Adversarial Training Methods

Pan, Shirui, Hu, Ruiqi, Fung, Sai-fu, Long, Guodong, Jiang, Jing, Zhang, Chengqi

Graph embedding aims to transfer a graph into vectors to facilitate subsequent graph analytics tasks like link prediction and graph clustering. Most approaches on graph embedding focus on preserving the graph structure or minimizing the reconstruction errors for graph data. They have mostly overlooked the embedding distribution of the latent codes, which unfortunately may lead to inferior representation in many cases. In this paper, we present a novel adversarially regularized framework for graph embedding. By employing the graph convolutional network as an encoder, our framework embeds the topological information and node content into a vector representation, from which a graph decoder is further built to reconstruct the input graph. The adversarial training principle is applied to enforce our latent codes to match a prior Gaussian or Uniform distribution. Based on this framework, we derive two variants of adversarial models, the adversarially regularized graph autoencoder (ARGA) and its variational version, adversarially regularized variational graph autoencoder (ARVGA), to learn the graph embedding effectively. We also exploit other potential variations of ARGA and ARVGA to get a deeper understanding on our designs. Experimental results compared among twelve algorithms for link prediction and twenty algorithms for graph clustering validate our solutions.

autoencoder, graph, graph autoencoder, (13 more...)

1901.0125

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningJan-3-2019

Modified Multidimensional Scaling and High Dimensional Clustering

Ding, Xiucai, Sun, Qiang

Multidimensional scaling is an important dimension reduction tool in statistics and machine learning. Yet few theoretical results characterizing its statistical performance exist, not to mention any in high dimensions. By considering a unified framework that includes low, moderate and high dimensions, we study multidimensional scaling in the setting of clustering noisy data. Our results suggest that, the classical multidimensional scaling can be modified to further improve the quality of embedded samples, especially when the noise level increases. To this end, we propose {\it modified multidimensional scaling} which applies a nonlinear transformation to the sample eigenvalues. The nonlinear transformation depends on the dimensionality, sample size and moment of noise. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology.

algorithm, matrix, probability, (15 more...)

1810.10172

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Kar, Soummya, Swenson, Brian

Clustering with Distributed Data

arXiv.org Machine LearningJan-1-2019

We consider K-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked K-means, or N K-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-dimensional statistics and not raw data at the agents. The proposed approach develops a parametric family of multi-agent clustering objectives (parameterized by ρ) and associated distributed N K-means algorithms (also parameterized by ρ). The NK-means algorithm with parameter ρ converges to a set of fixed points relative to the associated multi-agent objective (designated as generalized minima). By appropriate choice of ρ, the set of generalized minima may be brought arbitrarily close to the set of Lloyd's minima. Thus, the NK-means algorithm may be used to compute Lloyd's minima of the collective dataset up to arbitrary accuracy. Keywords: K-means clustering, Lloyd's minima, distributed algorithms, distributed machine learning, network information processing

algorithm, lloyd, minima, (16 more...)

1901.00214

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Lindenbaum, Ofir, Stanley, Jay, Wolf, Guy, Krishnaswamy, Smita

Geometry Based Data Generation

We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing generative models that represent data density, and are strongly affected by noise and other artifacts of data collection. We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.

artificial intelligence, machine learning, sugar, (15 more...)

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Mao, Xueyu, Sarkar, Purnamrita, Chakrabarti, Deepayan

Overlapping Clustering Models, and One (class) SVM to Bind Them All

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a non-negative weighted combination of "exemplars" who belong solely to one community, with some small noise. Geometrically, each person is a point on a cone whose corners are these exemplars. This basic form encompasses the widely used Mixed Membership Stochastic Blockmodel of networks and its degree-corrected variants, as well as topic models such as LDA. We show that a simple one-class SVM yields provably consistent parameter inference for all such models, and scales to large datasets. Experimental results on several simulated and real datasets show our algorithm (called SVM-cone) is both accurate and scalable.

artificial intelligence, bayesian inference, machine learning, (20 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Supervising Unsupervised Learning

Garg, Vikas

We introduce a framework to transfer knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via rigorous agnostic bounds on a variety of unsupervised problems. In the context of clustering, our approach helps choose the number of clusters and the clustering algorithm, remove the outliers, and provably circumvent Kleinberg's impossibility result. Experiments across hundreds of problems demonstrate improvements in performance on unsupervised data with simple algorithms despite the fact our problems come from heterogeneous domains. Additionally, our framework lets us leverage deep networks to learn common features across many small datasets, and perform zero shot learning.

algorithm, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Query K-means Clustering and the Double Dixie Cup Problem

Chien, I, Pan, Chao, Milenkovic, Olgica

We consider the problem of approximate $K$-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers. Our solution shows that, under some mild assumptions on the smallest cluster size, one can obtain an $(1+\epsilon)$-approximation for the optimal potential with probability at least $1-\delta$, where $\epsilon>0$ and $\delta\in(0,1)$, using an expected number of $O(\frac{K^3}{\epsilon \delta})$ noiseless same-cluster queries and comparison-based clustering of complexity $O(ndK + \frac{K^3}{\epsilon \delta})$; here, $n$ denotes the number of points and $d$ the dimension of space. Compared to a handful of other known approaches that perform importance sampling to account for small cluster sizes, the proposed query technique reduces the number of queries by a factor of roughly $O(\frac{K^6}{\epsilon^3})$, at the cost of possibly missing very small clusters. We extend this settings to the case where some queries to the oracle produce erroneous information, and where certain points, termed outliers, do not belong to any clusters. Our proof techniques differ from previous methods used for $K$-means clustering analysis, as they rely on estimating the sizes of the clusters and the number of points needed for accurate centroid estimation and subsequent nontrivial generalizations of the double Dixie cup problem. We illustrate the performance of the proposed algorithm both on synthetic and real datasets, including MNIST and CIFAR $10$.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States > Illinois (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > India > West Bengal > Kolkata (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)