AITopics

1506.06272

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Detectability thresholds and optimal algorithms for community structure in dynamic networks

Ghasemian, Amir, Zhang, Pan, Clauset, Aaron, Moore, Cristopher, Peel, Leto

We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.

artificial intelligence, dynamic network, us government, (20 more...)

doi: 10.1103/PhysRevX.6.031005

1506.06179

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.50)

Industry:

Energy > Oil & Gas (0.88)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

A general framework for the IT-based clustering methods

Qiu, Teng, Li, Yongjie

Previously, we proposed a physically inspired rule to organize the data points in a sparse yet effective structure, called the in-tree (IT) graph, which is able to capture a wide class of underlying cluster structures in the datasets, especially for the density-based datasets. Although there are some redundant edges or lines between clusters requiring to be removed by computer, this IT graph has a big advantage compared with the k-nearest-neighborhood (k-NN) or the minimal spanning tree (MST) graph, in that the redundant edges in the IT graph are much more distinguishable and thus can be easily determined by several methods previously proposed by us. In this paper, we propose a general framework to re-construct the IT graph, based on an initial neighborhood graph, such as the k-NN or MST, etc, and the corresponding graph distances. For this general framework, our previous way of constructing the IT graph turns out to be a special case of it. This general framework 1) can make the IT graph capture a wider class of underlying cluster structures in the datasets, especially for the manifolds, and 2) should be more effective to cluster the sparse or graph-based datasets.

artificial intelligence, graph, machine learning, (16 more...)

1506.06068

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Pahikkala, Tapio, Viljanen, Markus, Airola, Antti, Waegeman, Willem

Spectral Analysis of Symmetric and Anti-Symmetric Pairwise Kernels

Many real-world phenomena can be described in tems of pairwise relationships between entities. When learning pairwise relations, symmetry and anti-symmetry are two types of prior knowledge constraints that commonly appear when both of the objects in a pair belong to the same domain. A typical example of an application where relationships are often assumed to be symmetric is the prediction of protein-protein interactions: if protein A interacts with protein B, then conversely it also holds that B interacts with A. Typical example of an anti-symmetric relation would be a preference relation: if A is preferred over B, then conversely B is not preferred over A. Commonly used symmetric pairwise kernels include the symmetrized Kronecker [Ben-Hur and Noble, 2005] and Cartesian [Kashima et al., 2009], as well as the metric learning [Vert et al., 2007] kernels. Such kernels are analyzed in more detail by Brunner et al. [2012]. Typical examples of anti-symmetric kernels are the transitive kernel of [Herbrich et al., 2000] used for learning to rank, and the anti-symmetric Kronecker product kernel [Pahikkala et al., 2010] for learning intransitive preference relations.

artificial intelligence, kernel, machine learning, (17 more...)

1506.0595

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Lan, Shiwei, Shahbaba, Babak

Sampling constrained probability distributions using Spherical Augmentation

Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit, many copula models, and latent Dirichlet allocation (LDA). Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. In this paper, we propose a novel augmentation technique that handles a wide range of constraints by mapping the constrained domain to a sphere in the augmented space. By moving freely on the surface of this sphere, sampling algorithms handle constraints implicitly and generate proposals that remain within boundaries when mapped back to the original space. Our proposed method, called {Spherical Augmentation}, provides a mathematically natural and computationally efficient framework for sampling from constrained probability distributions. We show the advantages of our method over state-of-the-art sampling algorithms, such as exact Hamiltonian Monte Carlo, using several examples including truncated Gaussian distributions, Bayesian Lasso, Bayesian bridge regression, reconstruction of quantized stationary Gaussian process, and LDA for topic modeling.

artificial intelligence, constraint, machine learning, (18 more...)

1506.05936

Country: North America > United States > California (0.67)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Ashtiani, Hassan, Ben-David, Shai

Representation Learning for Clustering: A Statistical Framework

We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which $k$-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.

artificial intelligence, machine learning, mapping, (14 more...)

1506.059

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Vempala, Santosh S., Xiao, Ying

Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity

We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions. We use the technique to give a polynomial-time algorithm for standard ICA with sample complexity nearly linear in the dimension, thereby improving substantially on previous bounds. The analysis is based on properties of random polynomials, namely the spacings of an ensemble of polynomials. Our technique also applies to other applications of tensor decompositions, including spherical Gaussian mixture models.

algorithm, artificial intelligence, machine learning, (18 more...)

1412.2954

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Karahan, Esin, Rojas-Lopez, Pedro A., Bringas-Vega, Maria L., Valdes-Hernandez, Pedro A., Valdes-Sosa, Pedro A.

Tensor Analysis and Fusion of Multimodal Brain Images

Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.

artificial intelligence, data mining, machine learning, (21 more...)

doi: 10.1109/JPROC.2015.2455028

1506.0604

Country:

Asia (0.67)
North America > United States > California (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(5 more...)

Kandasamy, Kirthevasan, Krishnamurthy, Akshay, Poczos, Barnabas, Wasserman, Larry, Robins, James M.

Influence Functions for Machine Learning: Nonparametric Estimators for Entropies, Divergences and Mutual Informations

arXiv.org Artificial IntelligenceJun-19-2015

Entropies, divergences, and mutual informations are classical information-theoretic quantities that play fundamental roles in statistics, machine learning, and across the mathematical sciences. In addition to their use as analytical tools, they arise in a variety of applications including hypothesis testing, parameter estimation, feature selection, and optimal experimental design. In many of these applications, it is important to estimate these functionals from data so that they can be used in downstream algorithmic or scientific tasks. In this paper, we develop a recipe for estimating statistical functionals of one or more nonparametric distributions based on the notion of influence functions. Entropy estimators are used in applications ranging from independent components analysis [Learned-Miller and John, 2003], intrinsic dimension estimation [Carter et al., 2010] and several signal processing applications [Hero et al., 2002].

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1411.4342

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Liebman, Elad, Chor, Benny, Stone, Peter

Representative Selection in Non Metric Datasets

arXiv.org Artificial IntelligenceJun-19-2015

This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements. This subset needs to inherently reflect the type of information contained in the entire set, while minimizing redundancy. For such purposes, clustering may seem like a natural approach. However, existing clustering methods are not ideally suited for representative selection, especially when dealing with non-metric data, where only a pairwise similarity measure exists. In this paper we propose $\delta$-medoids, a novel approach that can be viewed as an extension to the $k$-medoids algorithm and is specifically suited for sample representative selection from non-metric data. We empirically validate $\delta$-medoids in two domains, namely music analysis and motion analysis. We also show some theoretical bounds on the performance of $\delta$-medoids and the hardness of representative selection in general.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/08839514.2015.1071092

1502.07428

Country: North America > United States > Texas (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)