Goto

Collaborating Authors

 Deyo, Sean


A transparent approach to data representation

arXiv.org Artificial Intelligence

We take inspiration from the non-negative matrix factorization (NMF) problem. In NMF, one large m n In 2006 Netflix released a data set -- roughly 100 million matrix M with non-negative values is factored as a product ratings of 17770 titles, given by 480189 viewers -- of two smaller non-negative matrices R and C of size and posed a challenge: Use this training data to predict m l and l n, respectively (where l m,n). Imagining the ratings in a separate, hidden set of ratings involving the set of ratings as the M matrix, with each row the same movies and viewers. The first to do so with a corresponding to a viewer and each column corresponding root-mean-square prediction error (RMSE) at least 10% to a movie, one can think of each row of R as an lower than that of Netflix's own system would receive a attribute vector for the corresponding viewer.


A logical word embedding for learning grammar

arXiv.org Artificial Intelligence

We introduce the logical grammar emdebbing (LGE), a model inspired by pregroup grammars and categorial grammars to enable unsupervised inference of lexical categories and syntactic rules from a corpus of text. LGE produces comprehensible output summarizing its inferences, has a completely transparent process for producing novel sentences, and can learn from as few as a hundred sentences.


Learning grammar with a divide-and-concur neural network

arXiv.org Artificial Intelligence

We implement a divide-and-concur iterative projection approach to context-free grammar inference. Unlike most state-of-the-art models of natural language processing, our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable -- one can read off from a solution how to construct grammatically valid sentences. Another advantage of our approach is the ability to infer meaningful grammatical rules from just a few sentences, compared to the hundreds of gigabytes of training data many other models employ. We demonstrate several ways of applying our approach: classifying words and inferring a grammar from scratch, taking an existing grammar and refining its categories and rules, and taking an existing grammar and expanding its lexicon as it encounters new words in new data.