AITopics

doi: 10.1214/14-EJS934

1404.6289

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.81)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Norouzi, Mohammad, Punjani, Ali, Fleet, David J.

Fast Exact Search in Hamming Space with Multi-Index Hashing

arXiv.org Artificial IntelligenceApr-24-2014

There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1307.2982

Country: North America (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Bora, Dibya Jyoti, Gupta, Dr. Anil Kumar

A Comparative study Between Fuzzy Clustering Algorithm and Hard Clustering Algorithm

arXiv.org Artificial IntelligenceApr-24-2014

Data clustering is an important area of data mining. This is an unsupervised study where data of similar types are put into one cluster while data of another types are put into different cluster. Fuzzy C means is a very important clustering technique based on fuzzy logic. Also we have some hard clustering techniques available like K-means among the popular ones. In this paper a comparative study is done between Fuzzy clustering algorithm and hard clustering algorithm.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.14445/22312803/IJCTT-V10P119

1404.6059

Country: North America > United States > California (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceApr-24-2014

Unsupervised Text Extraction from G-Maps

Adak, Chandranath

This paper represents an text extraction method from Google maps, GIS maps/images. Due to an unsupervised approach there is no requirement of any prior knowledge or training set about the textual and non-textual parts. Fuzzy CMeans clustering technique is used for image segmentation and Prewitt method is used to detect the edges. Connected component analysis and gridding technique enhance the correctness of the results. The proposed method reaches 98.5% accuracy level on the basis of experimental data sets.

machine learning, natural language, text extraction, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICHCI-IEEE.2013.6887782

1404.6075

Country: Asia > India > West Bengal (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Tolvanen, Ville, Jylänki, Pasi, Vehtari, Aki

Approximate Inference for Nonstationary Heteroscedastic Gaussian process Regression

arXiv.org Machine LearningApr-22-2014

This paper presents a novel approach for approximate integration over the uncertainty of noise and signal variances in Gaussian process (GP) regression. Our efficient and straightforward approach can also be applied to integration over input dependent noise variance (heteroscedasticity) and input dependent signal variance (nonstationarity) by setting independent GP priors for the noise and signal variances. We use expectation propagation (EP) for inference and compare results to Markov chain Monte Carlo in two simulated data sets and three empirical examples. The results show that EP produces comparable results with less computational burden.

artificial intelligence, machine learning, variance, (19 more...)

doi: 10.1109/MLSP.2014.6958906

1404.5443

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Sasaki, Hiroaki, Hyvärinen, Aapo, Sugiyama, Masashi

Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density

arXiv.org Machine LearningApr-20-2014

Mean shift clustering finds the modes of the data probability density by identifying the zero points of the density gradient. Since it does not require to fix the number of clusters in advance, the mean shift has been a popular clustering algorithm in various application fields. A typical implementation of the mean shift is to first estimate the density by kernel density estimation and then compute its gradient. However, since good density estimation does not necessarily imply accurate estimation of the density gradient, such an indirect two-step approach is not reliable. In this paper, we propose a method to directly estimate the gradient of the log-density without going through density estimation. The proposed method gives the global solution analytically and thus is computationally efficient. We then develop a mean-shift-like fixed-point algorithm to find the modes of the density for clustering. As in the mean shift, one does not need to set the number of clusters in advance. We empirically show that the proposed clustering method works much better than the mean shift especially for high-dimensional data. Experimental results further indicate that the proposed method outperforms existing clustering methods.

artificial intelligence, gradient, machine learning, (16 more...)

1404.5028

Country: Europe > Finland (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Carlsson, Gunnar, Mémoli, Facundo, Ribeiro, Alejandro, Segarra, Santiago

Hierarchical Quasi-Clustering Methods for Asymmetric Networks

This paper introduces hierarchical quasi-clustering methods, a generalization of hierarchical clustering for asymmetric networks where the output structure preserves the asymmetry of the input data. We show that this output structure is equivalent to a finite quasi-ultrametric space and study admissibility with respect to two desirable properties. We prove that a modified version of single linkage is the only admissible quasi-clustering method. Moreover, we show stability of the proposed method and we establish invariance properties fulfilled by it. Algorithms are further developed and the value of quasi-clustering analysis is illustrated with a study of internal migration within United States.

artificial intelligence, hierarchical quasi-clustering method, machine learning, (16 more...)

1404.4655

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance (0.93)
Energy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Shrivastava, Anshumali, Li, Ping

A New Space for Comparing Graphs

Finding a new mathematical representations for graph, which allows direct comparison between different graph structures, is an open-ended research direction. Having such a representation is the first prerequisite for a variety of machine learning algorithms like classification, clustering, etc., over graph datasets. In this paper, we propose a symmetric positive semidefinite matrix with the $(i,j)$-{th} entry equal to the covariance between normalized vectors $A^ie$ and $A^je$ ($e$ being vector of all ones) as a representation for graph with adjacency matrix $A$. We show that the proposed matrix representation encodes the spectrum of the underlying adjacency matrix and it also contains information about the counts of small sub-structures present in the graph such as triangles and small paths. In addition, we show that this matrix is a \emph{"graph invariant"}. All these properties make the proposed matrix a suitable object for representing graphs. The representation, being a covariance matrix in a fixed dimensional metric space, gives a mathematical embedding for graphs. This naturally leads to a measure of similarity on graph objects. We define similarity between two given graphs as a Bhattacharya similarity measure between their corresponding covariance matrix representations. As shown in our experimental study on the task of social network classification, such a similarity measure outperforms other widely used state-of-the-art methodologies. Our proposed method is also computationally efficient. The computation of both the matrix representation and the similarity value can be performed in operations linear in the number of edges. This makes our method scalable in practice. We believe our theoretical and empirical results provide evidence for studying truncated power iterations, of the adjacency matrix, to characterize social networks.

artificial intelligence, machine learning, social media, (19 more...)

1404.4644

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (0.56)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Paul, Saurabh, Boutsidis, Christos, Magdon-Ismail, Malik, Drineas, Petros

Random Projections for Linear Support Vector Machines

Let X be a data matrix of rank \rho, whose rows represent n points in d-dimensional space. The linear support vector machine constructs a hyperplane separator that maximizes the 1-norm soft margin. We develop a new oblivious dimension reduction technique which is precomputed and can be applied to any input matrix X. We prove that, with high probability, the margin and minimum enclosing ball in the feature space are preserved to within \epsilon-relative error, ensuring comparable generalization as in the original space in the case of classification. For regression, we show that the margin is preserved to \epsilon-relative error with high probability. We present extensive experiments with real and synthetic data to support our theory.

artificial intelligence, machine learning, matrix, (13 more...)

1211.6085

Country: North America > United States (0.67)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Mardani, Morteza, Mateos, Gonzalo, Giannakis, Georgios B.

Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors

Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with `Big Data' analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for {\em real-time} processing of streaming data pose major challenges to this end. In this context, the present paper permeates benefits from rank minimization to scalable imputation of missing data, via tracking low-dimensional subspaces and unraveling latent (possibly multi-way) structure from \emph{incomplete streaming} data. For low-rank matrix data, a subspace estimator is proposed based on an exponentially-weighted least-squares criterion regularized with the nuclear norm. After recasting the non-separable nuclear norm into a form amenable to online optimization, real-time algorithms with complementary strengths are developed and their convergence is established under simplifying technical assumptions. In a stationary setting, the asymptotic estimates obtained offer the well-documented performance guarantees of the {\em batch} nuclear-norm regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm is developed to obtain multi-way decompositions of \emph{low-rank tensors} with missing entries, and perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed algorithms, and their superior performance relative to state-of-the-art alternatives.

artificial intelligence, data mining, machine learning, (21 more...)

doi: 10.1109/TSP.2015.2417491

1404.4667

Country:

Europe (0.92)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (0.93)
Information Technology (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.90)