AITopics

Ninth International AAAI Conference on Web and Social Media

Genre: Research Report (0.73)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Zhang, Zhengwu, Pati, Debdeep, Srivastava, Anuj

Bayesian Clustering of Shapes of Curves

arXiv.org Machine LearningApr-1-2015

The general goal here is to choose groups (clusters) of objects so as to maximize homogeneity within clusters and minimize homogeneity across clusters. The clustering problem has been addressed by researchers in many disciplines. A few well-known methods are metric based e.g. K-means (MacQueen et al., 1967), hierarchical clustering (Ward, 1963), clustering based on principal components, spectral clustering (Ng et al., 2002) and so on (Jain and Dubes, 1988; Ozawa, 1985). Traditional clustering methods are complemented by methods based on a probability model where one assumes a data generating distribution (e.g., Gaussian) and infers clustering configurations that maximize certain objective function (Banfield and Raftery, 1993; Fraley and Raftery, 1998, 2002, 2006; MacCullagh and Yang, 2008). A modelbased clustering can be useful in addressing challenges posed by traditional clustering methods. This is because a probability model allows the number of clusters to be treated as a parameter in the model, and can be embedded in a Bayesian framework providing quantification of uncertainty in the number of clusters and clustering configurations.

artificial intelligence, machine learning, matrix, (16 more...)

1504.00377

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceMar-22-2015

Construction of FuzzyFind Dictionary using Golay Coding Transformation for Searching Applications

Kowsari, Kamran, Yammahi, Maryam, Bari, Nima, Vichr, Roman, Alsaby, Faisal, Berkovich, Simon Y.

Searching through a large volume of data is very critical for companies, scientists, and searching engines applications due to time complexity and memory complexity. In this paper, a new technique of generating FuzzyFind Dictionary for text mining was introduced. We simply mapped the 23 bits of the English alphabet into a FuzzyFind Dictionary or more than 23 bits by using more FuzzyFind Dictionary, and reflecting the presence or absence of particular letters. This representation preserves closeness of word distortions in terms of closeness of the created binary vectors within Hamming distance of 2 deviations. This paper talks about the Golay Coding Transformation Hash Table and how it can be used on a FuzzyFind Dictionary as a new technology for using in searching through big data. This method is introduced by linear time complexity for generating the dictionary and constant time complexity to access the data and update by new data sets, also updating for new data sets is linear time depends on new data points. This technique is based on searching only for letters of English that each segment has 23 bits, and also we have more than 23-bit and also it could work with more segments as reference table.

data mining, information retrieval, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.14569/IJACSA.2015.060313

1503.06483

Country: North America > United States (0.95)

Genre: Research Report (0.40)

Industry:

Government (0.47)
Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Machine LearningMar-18-2015

Nonparametric Nearest Neighbor Descent Clustering based on Delaunay Triangulation

Qiu, Teng, Li, Yongjie

Abstract: In our physically inspired in-tree (IT) based clustering algorithm and the series after it, there is only one free parameter involved in computing the potential value of each point. In this work, based on the Delaunay Triangulation or its dual Voronoi tessellation, we propose a nonparametric process to compute potential values by the local information. This computation, though nonparametric, is relatively very rough, and consequently, many local extreme points will be generated. However, unlike those gradient-based methods, our ITbased methods are generally insensitive to those local extremes. This positively demonstrates the superiority of these parametric (previous) and nonparametric (in this work) ITbased methods. 1 Introduction In (1), we proposed a physically inspired clustering algorithm, in which an in-tree (IT) structure was first constructed. This IT structure organizes the data points into the clusters with several undesired connections (edges) between them requiring to be removed.

artificial intelligence, delaunay triangulation, machine learning, (14 more...)

1502.04837

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.56)

arXiv.org Machine LearningMar-18-2015

IT-map: an Effective Nonlinear Dimensionality Reduction Method for Interactive Clustering

Qiu, Teng, Li, Yongjie

In our previous works (1, 2), we have shown its potential in cluster analysis. Combinations of the IT structure with the Semi-Supervised learning concept (3), Rodriguez and Laio's "Decision Graph" (4), and Frey and Dueck's "Affinity Propagation" (AP) (5), have resulted in effective cluster analysis methods. For example, based on the IT structure, the application scope of AP was extended from spherical to nonspherical cluster detection (2). In this paper, we will show another potential of the IT structure: nonlinear dimensionality reduction, for which an effective combination is made with the "isometric mapping" (Isomap) proposed by Tenenbaum et al (6). Isomap is a simple and effective dimensionality reduction method which extends the application scope of multidimensional scaling (MDS) from linear to nonlinear structure. It contains three steps: first construct the K-nearest-neighborhood (KNN) graph, then compute the graph distances (the shortest path distances in the neighborhood graph) and lastly compute the low-dimensional embedding by classical MDS. In effect, the constructed KNN graph for data points is unfolded in the low-dimensional Euclidean space, which is effective especially for preserving in the embedding the topology relationship of data points on manifolds. The crux of the success for Isomap is that it takes as the input for classical MDS the graph distances, instead of the straight-line Euclidian ones, for all pairs of data points.

artificial intelligence, dataset, machine learning, (13 more...)

1501.0645

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Chen, Yudong, Xu, Jiaming

Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices

arXiv.org Machine LearningMar-13-2015

We consider two closely related problems: planted clustering and submatrix localization. The planted clustering problem assumes that a random graph is generated based on some underlying clusters of the nodes; the task is to recover these clusters given the graph. The submatrix localization problem concerns locating hidden submatrices with elevated means inside a large real-valued random matrix. Of particular interest is the setting where the number of clusters/submatrices is allowed to grow unbounded with the problem size. These formulations cover several classical models such as planted clique, planted densest subgraph, planted partition, planted coloring, and stochastic block model, which are widely used for studying community detection and clustering/bi-clustering. For both problems, we show that the space of the model parameters (cluster/submatrix size, cluster density, and submatrix mean) can be partitioned into four disjoint regions corresponding to decreasing statistical and computational complexities: (1) the \emph{impossible} regime, where all algorithms fail; (2) the \emph{hard} regime, where the computationally expensive Maximum Likelihood Estimator (MLE) succeeds; (3) the \emph{easy} regime, where the polynomial-time convexified MLE succeeds; (4) the \emph{simple} regime, where a simple counting/thresholding procedure succeeds. Moreover, we show that each of these algorithms provably fails in the previous harder regimes. Our theorems establish the minimax recovery limit, which are tight up to constants and hold with a growing number of clusters/submatrices, and provide a stronger performance guarantee than previously known for polynomial-time algorithms. Our study demonstrates the tradeoffs between statistical and computational considerations, and suggests that the minimax recovery limit may not be achievable by polynomial-time algorithms.

artificial intelligence, machine learning, probability, (18 more...)

1402.1267

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningMar-12-2015

Spectral Clustering for Divide-and-Conquer Graph Matching

Lyzinski, Vince, Sussman, Daniel L., Fishkind, Donniell E., Pao, Henry, Chen, Li, Vogelstein, Joshua T., Park, Youngser, Priebe, Carey E.

Graph matching is an increasingly important problem in inferential graph statistics, with applications across a broad spectrum of fields including computer vision ([38], [10]), shape matching and object recognition ([4], [7]), and biology and neuroscience ([22], [34], [37]), to name a few. The graph matching problem (GMP) seeks to find an alignment between the vertex sets of two graphs that best preserves common structure across graphs. Unfortunately, the GMP is inherently combinatorial, and no efficient exact graph matching algorithms are known. Indeed, even the simpler problem of determining if two graphs are isomorphic is famously of unknown complexity ([19], 1 [30]), and if the graphs are allowed to be loopy, weighted and directed, then the simplest version of GMP is equivalent to the NPhard quadratic assignment problem. Due to its wide applicability, there exist a vast number of approximating algorithms for GMP; see the paper "30 Years of Graph Matching in Pattern Recognition" ([11]) for an excellent survey of the existing literature.

artificial intelligence, graph, machine learning, (15 more...)

1310.1297

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Government > Military (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Wu, Zhaohui (The Pennsylvania State University) | Giles, C. Lee (The Pennsylvania State University)

Sense-Aaware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia

AAAI ConferencesMar-6-2015

Human languages are naturally ambiguous, which makes it difficult to automatically understand the semantics of text. Most vector space models (VSM) treat all occurrences of a word as the same and build a single vector to represent the meaning of a word, which fails to capture any ambiguity. We present sense-aware semantic analysis (SaSA), a multi-prototype VSM for word representation based on Wikipedia, which could account for homonymy and polysemy. The "sense-specific'' prototypes of a word are produced by clustering Wikipedia pages based on both local and global contexts of the word in Wikipedia. Experimental evaluations on semantic relatedness for both isolated words and words in sentential contexts and word sense induction demonstrate its effectiveness.

artificial intelligence, machine learning, natural language, (21 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Industry:

Transportation > Ground (0.46)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Gunaratna, Kalpa (Kno.e.sis, Wright State University) | Thirunarayan, Krishnaparasad (Kno.e.sis, Wright State University) | Sheth, Amit (Kno.e.sis, Wright State University)

FACES: Diversity-Aware Entity Summarization Using Incremental Hierarchical Conceptual Clustering

AAAI ConferencesMar-6-2015

Semantic Web documents that encode facts about entities on the Web have been growing rapidly in size and evolving over time. Creating summaries on lengthy Semantic Web documents for quick identification of the corresponding entity has been of great contemporary interest. In this paper, we explore automatic summarization techniques that characterize and enable identification of an entity and create summaries that are human friendly. Specifically, we highlight the importance of diversified (faceted) summaries by combining three dimensions: diversity, uniqueness, and popularity. Our novel diversity-aware entity summarization approach mimics human conceptual clustering techniques to group facts and picks representative facts from each group to form concise (i.e., short) and comprehensive (i.e., improved coverage through diversity) summaries. We evaluate our approach against the state-of-the-art techniques and show that our work improves both the quality and the efficiency of entity summarization.

algorithm, entity summary, facet, (16 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Ohio > Montgomery County > Dayton (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Information Management > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)
Information Technology > Communications > Web > Semantic Web (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.70)

Ghoshdastidar, Debarghya (Indian Institute of Science, Bangalore) | Dukkipati, Ambedkar (Indian Institute of Science, Bangalore)

Spectral Clustering Using Multilinear SVD: Analysis, Approximations and Applications

AAAI ConferencesMar-6-2015

Spectral clustering, a graph partitioning technique, has gained immense popularity in machine learning in the context of unsupervised learning. This is due to convincing empirical studies, elegant approaches involved and the theoretical guarantees provided in the literature. To tackle some challenging problems that arose in computer vision etc., recently, a need to develop spectral methods that incorporate multi-way similarity measures surfaced. This, in turn, leads to a hypergraph partitioning problem. In this paper, we formulate a criterion for partitioning uniform hypergraphs, and show that a relaxation of this problem is related to the multilinear singular value decomposition (SVD) of symmetric tensors. Using this, we provide a spectral technique for clustering based on higher order affinities, and derive a theoretical bound on the error incurred by this method. We also study the complexity of the algorithm and use Nystr ̈om’s method and column sampling techniques to develop approximate methods with significantly reduced complexity. Experiments on geometric grouping and motion segmentation demonstrate the practical significance of the proposed methods.

algorithm, hypergraph, tensor, (16 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)