AITopics | Sedghi, Hanie

Collaborating Authors

Sedghi, Hanie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Knowledge Completion for Generics using Guided Tensor Factorization

Sedghi, Hanie, Sabharwal, Ashish

arXiv.org Machine LearningFeb-24-2017

Given a knowledge base (KB) rich in facts about common nouns or generics, such as "all trees produce oxygen" or "some animals live in forests", we consider the problem of deriving additional such facts at a high precision. While this problem has received much attention for named entity KBs such as Freebase, little emphasis has been placed on generics despite their importance for capturing general knowledge. Different from named entity KBs, generics KBs involve implicit or explicit quantification, have more complex underlying regularities, are substantially more incomplete, and violate the commonly used locally closed world assumption (LCWA). Consequently, existing completion methods struggle with this new task. We observe that external information, such as relation schemas and entity taxonomies, if used correctly, can be surprisingly powerful in addressing the challenges associated with generics. Using this insight, we propose a simple yet effective knowledge guided tensor factorization approach that achieves state-of-the-art results on two generics KBs for science, doubling their size at 74\%-86\% precision. Further, to address the paucity of facts about rare entities such as oriole (a bird), we present a novel taxonomy guided submodular active learning method to collect additional annotations that are over five times more effective in inferring further new facts than multiple active learning baselines.

artificial intelligence, natural language, tensor, (20 more...)

arXiv.org Machine Learning

1612.03871

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Training Input-Output Recurrent Neural Networks through Spectral Methods

Sedghi, Hanie, Anandkumar, Anima

arXiv.org Machine LearningOct-31-2016

We consider the problem of training input-output recurrent neural networks (RNN) for sequence labeling tasks. We propose a novel spectral approach for learning the network parameters. It is based on decomposition of the cross-moment tensor between the output and a non-linear transformation of the input, based on score functions. We guarantee consistent learning with polynomial sample and computational complexity under transparent conditions such as non-degeneracy of model parameters, polynomial activations for the neurons, and a Markovian evolution of the input sequence. We also extend our results to Bidirectional RNN which uses both previous and future information to output the label at each time point, and is employed in many NLP tasks such as POS tagging.

deep learning, neural network, sequence, (19 more...)

arXiv.org Machine Learning

1603.00954

Country: North America > United States > California (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)

Add feedback

Provable Tensor Methods for Learning Mixtures of Generalized Linear Models

Sedghi, Hanie, Janzamin, Majid, Anandkumar, Anima

arXiv.org Machine LearningJan-12-2016

We consider the problem of learning mixtures of generalized linear models (GLM) which arise in classification and regression problems. Typical learning approaches such as expectation maximization (EM) or variational Bayes can get stuck in spurious local optima. In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters. The key insight is to employ certain feature transformations of the input, which depend on the input generative model. Specifically, we employ score function tensors of the input and compute their cross-correlation with the response variable. We establish that the decomposition of this tensor consistently recovers the parameters, under mild non-degeneracy conditions. We demonstrate that the computational and sample complexity of our method is a low order polynomial of the input and the latent dimensions.

artificial intelligence, machine learning, score function, (17 more...)

arXiv.org Machine Learning

1412.3046

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Janzamin, Majid, Sedghi, Hanie, Anandkumar, Anima

arXiv.org Machine LearningJan-11-2016

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

1506.08473

Country:

North America > United States > California (0.14)
Europe > France (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Noisy Matrix Decomposition

Sedghi, Hanie, Anandkumar, Anima, Jonckheere, Edmond

arXiv.org Machine LearningJul-6-2015

We propose an efficient ADMM method with guarantees for high-dimensional problems. We provide explicit bounds for the sparse optimization problem and the noisy matrix decomposition problem. For sparse optimization, we establish that the modified ADMM method has an optimal convergence rate of $\mathcal{O}(s\log d/T)$, where $s$ is the sparsity level, $d$ is the data dimension and $T$ is the number of steps. This matches with the minimax lower bounds for sparse estimation. For matrix decomposition into sparse and low rank components, we provide the first guarantees for any online method, and prove a convergence rate of $\tilde{\mathcal{O}}((s+r)\beta^2(p) /T) + \mathcal{O}(1/p)$ for a $p\times p$ matrix, where $s$ is the sparsity level, $r$ is the rank and $\Theta(\sqrt{p})\leq \beta(p)\leq \Theta(p)$. Our guarantees match the minimax lower bound with respect to $s,r$ and $T$. In addition, we match the minimax lower bound with respect to the matrix dimension $p$, i.e. $\beta(p)=\Theta(\sqrt{p})$, for many important statistical models including the independent noise model, the linear Bayesian network and the latent Gaussian graphical model under some conditions. Our ADMM method is based on epoch-based annealing and consists of inexpensive steps which involve projections on to simple norm balls. Experiments show that for both sparse optimization and matrix decomposition problems, our algorithm outperforms the state-of-the-art methods. In particular, we reach higher accuracy with same time complexity.

bayesian inference, convergence rate, optimization problem, (17 more...)

arXiv.org Machine Learning

1402.5131

Country: North America > United States > California (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.74)
(2 more...)

Add feedback

Provable Methods for Training Neural Networks with Sparse Connectivity

Sedghi, Hanie, Anandkumar, Anima

arXiv.org Machine LearningApr-28-2015

We provide novel guaranteed approaches for training feedforward neural networks with sparse connectivity. We leverage on the techniques developed previously for learning linear networks and show that they can also be effectively adopted to learn non-linear networks. We operate on the moments involving label and the score function of the input, and show that their factorization provably yields the weight matrix of the first layer of a deep network under mild conditions. In practice, the output of our method can be employed as effective initializers for gradient descent.

artificial intelligence, neural network, score function, (15 more...)

arXiv.org Machine Learning

1412.2693

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods

Anandkumar, Anima, Sedghi, Hanie

arXiv.org Machine LearningApr-22-2015

Community detection in graphs has been extensively studied both in theory and in applications. However, detecting communities in hypergraphs is more challenging. In this paper, we propose a tensor decomposition approach for guaranteed learning of communities in a special class of hypergraphs modeling social tagging systems or folksonomies. A folksonomy is a tripartite 3-uniform hypergraph consisting of (user, tag, resource) hyperedges. We posit a probabilistic mixed membership community model, and prove that the tensor method consistently learns the communities under efficient sample complexity and separation requirements.

artificial intelligence, node, social media, (19 more...)

arXiv.org Machine Learning

1503.04567

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Media > Film (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Score Function Features for Discriminative Learning

Janzamin, Majid, Sedghi, Hanie, Anandkumar, Anima

arXiv.org Machine LearningApr-19-2015

Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.

artificial intelligence, information, neural network, (17 more...)

arXiv.org Machine Learning

1412.6514

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Multi-Step Stochastic ADMM in High Dimensions: Applications to Sparse Optimization and Matrix Decomposition

Sedghi, Hanie, Anandkumar, Anima, Jonckheere, Edmond

Neural Information Processing SystemsDec-31-2014

In this paper, we consider a multi-step version of the stochastic ADMM method with efficient guarantees for high-dimensional problems. We first analyze the simple setting, where the optimization problem consists of a loss function and a single regularizer (e.g. sparse optimization), and then extend to the multi-block setting with multiple regularizers and multiple variables (e.g. matrix decomposition into sparse and low rank components). For the sparse optimization problem, our method achieves the minimax rate of $O(s\log d/T)$ for $s$-sparse problems in $d$ dimensions in $T$ steps, and is thus, unimprovable by any method up to constant factors. For the matrix decomposition problem with a general loss function, we analyze the multi-step ADMM with multiple blocks. We establish $O(1/T)$ rate and efficient scaling as the size of matrix grows. For natural noise models (e.g. independent noise), our convergence rate is minimax-optimal. Thus, we establish tight convergence guarantees for multi-block ADMM in high dimensions. Experiments show that for both sparse optimization and matrix decomposition problems, our algorithm outperforms the state-of-the-art methods.

artificial intelligence, convergence rate, optimization problem, (16 more...)

Neural Information Processing Systems

Country: