Smolensky, Paul
Question-Answering with Grammatically-Interpretable Representations
Palangi, Hamid (Microsoft Research AI) | Smolensky, Paul (Microsoft Research AI, Johns Hopkins University) | He, Xiaodong (Microsoft Research AI) | Deng, Li (Citadel)
We introduce an architecture, the Tensor Product RecurrentNetwork (TPRN). In our application of TPRN, internal representations—learned by end-to-end optimization in a deep neural network performing a textual question-answering(QA) task—can be interpreted using basic concepts from linguistic theory. No performance penalty need be paid for this increased interpretability: the proposed model performs comparably to a state-of-the-art system on the SQuAD QA task.The internal representation which is interpreted is a Tensor Product Representation: for each input word, the model selects a symbol to encode the word, and a role in which to place the symbol, and binds the two together. The selection is via soft attention. The overall interpretation is built from interpretations of the symbols, as recruited by the trained model, and interpretations of the roles as used by the model. We find support for our initial hypothesis that symbols can be interpreted as lexical-semantic word meanings, while roles can be interpreted as approximations of grammatical roles (or categories)such as subject, wh-word, determiner, etc. Fine-grained analysis reveals specific correspondences between the learned roles and parts of speech as assigned by a standard tagger(Toutanova et al. 2003), and finds several discrepancies in the model’s favor. In this sense, the model learns significant aspectsof grammar, after having been exposed solely to linguistically unannotated text, questions, and answers: no prior linguistic knowledge is given to the model. What is given is the means to build representations using symbols and roles, with an inductive bias favoring use of these in an approximately discrete manner.
Harmonic Grammars for Formal Languages
Smolensky, Paul
Basic connectionist principles imply that grammars should take the form of systems of parallel soft constraints defining an optimization problem the solutions to which are the well-formed structures in the language. Such Harmonic Grammars have been successfully applied to a number of problems in the theory of natural languages. Here it is shown that formal languages too can be specified by Harmonic Grammars, rather than by conventional serial rewrite rule systems. 1 HARMONIC GRAMMARS In collaboration with Geraldine Legendre, Yoshiro Miyata, and Alan Prince, I have been studying how symbolic computation in human cognition can arise naturally as a higher-level virtual machine realized in appropriately designed lower-level connectionist networks. The basic computational principles of the approach are these: (1) a. \Vhell analyzed at the lower level, mental representations are distributed patterns of connectionist activity; when analyzed at a higher level, these same representations constitute symbolic structures.
Harmonic Grammars for Formal Languages
Smolensky, Paul
Basic connectionist principles imply that grammars should take the form of systems of parallel soft constraints defining an optimization problem the solutions to which are the well-formed structures in the language. Such Harmonic Grammars have been successfully applied to a number of problems in the theory of natural languages. Here it is shown that formal languages too can be specified by Harmonic Grammars, rather than by conventional serial rewrite rule systems. 1 HARMONIC GRAMMARS In collaboration with Geraldine Legendre, Yoshiro Miyata, and Alan Prince, I have been studying how symbolic computation in human cognition can arise naturally as a higher-level virtual machine realized in appropriately designed lower-level connectionist networks.The basic computational principles of the approach are these: (1) a. \Vhell analyzed at the lower level, mental representations are distributed patternsof connectionist activity; when analyzed at a higher level, these same representations constitute symbolic structures.
Rule Induction through Integrated Symbolic and Subsymbolic Processing
McMillan, Clayton, Mozer, Michael C., Smolensky, Paul
We describe a neural network, called RufeNet, that learns explicit, symbolic condition-action rules in a formal string manipulation domain. of the domain,RuleNet discovers functional categories over elements and, at various points during learning, extracts rules that operate on these categories. The rules are then injected back into RuleNet and in a process called iterative projection. By incorporatingtraining continues, rules in this way, RuleNet exhibits enhanced learning and generalization performance over alternative neural net approaches. By integrating symbolic rule learning and subsymbolic category learning, RuleNet has capabilities that go beyond a purely symbolic system. We show how this architecture can be applied to the problem of case-role assignment in natural language processing, yielding a novel rule-based solution.
Rule Induction through Integrated Symbolic and Subsymbolic Processing
McMillan, Clayton, Mozer, Michael C., Smolensky, Paul
We describe a neural network, called RufeNet, that learns explicit, symbolic condition-action rules in a formal string manipulation domain. RuleNet discovers functional categories over elements of the domain, and, at various points during learning, extracts rules that operate on these categories. The rules are then injected back into RuleNet and training continues, in a process called iterative projection. By incorporating rules in this way, RuleNet exhibits enhanced learning and generalization performance over alternative neural net approaches. By integrating symbolic rule learning and subsymbolic category learning, RuleNet has capabilities that go beyond a purely symbolic system. We show how this architecture can be applied to the problem of case-role assignment in natural language processing, yielding a novel rule-based solution.
Distributed Recursive Structure Processing
Legendre, Geraldine, Miyata, Yoshiro, Smolensky, Paul
Harmonic grammar (Legendre, et al., 1990) is a connectionist theory of linguistic on the assumption that the well-formednesswell-formed ness based of a sentence can be measured by the harmony (negative energy) of the corresponding connectionist state. Assuming a lower-level connectionist that obeys a few general connectionist principles but is otherwisenetwork we construct a higher-level network with an equivalent harmonyunspecified, function that captures the most linguistically relevant global aspects of the lower level network. In this paper, we extend the tensor product representation (Smolensky 1990) to fully recursive representations of recursively structured objects like sentences in the lower-level network.
Distributed Recursive Structure Processing
Legendre, Geraldine, Miyata, Yoshiro, Smolensky, Paul
Harmonic grammar (Legendre, et al., 1990) is a connectionist theory of linguistic well-formed ness based on the assumption that the well-formedness of a sentence can be measured by the harmony (negative energy) of the corresponding connectionist state. Assuming a lower-level connectionist network that obeys a few general connectionist principles but is otherwise unspecified, we construct a higher-level network with an equivalent harmony function that captures the most linguistically relevant global aspects of the lower level network. In this paper, we extend the tensor product representation (Smolensky 1990) to fully recursive representations of recursively structured objects like sentences in the lower-level network. We show theoretically and with an example the power of the new technique for parallel distributed structure processing.
Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment
Mozer, Michael C., Smolensky, Paul
This paper proposes a means of using the knowledge in a network to determine the functionality or relevance of individual units, both for the purpose of understanding the network's behavior and improving its performance. The basic idea is to iteratively train the network to a certain performancecriterion, compute a measure of relevance that identifies whichinput or hidden units are most critical to performance, and automatically trim the least relevant units. This skeletonization technique canbe used to simplify networks by eliminating units that convey redundant information; to improve learning performance by first learning with spare hidden units and then trimming the unnecessary ones away, thereby constraining generalization; and to understand the behavior of networks in terms of minimal "rules."
Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment
Mozer, Michael C., Smolensky, Paul
This paper proposes a means of using the knowledge in a network to determine the functionality or relevance of individual units, both for the purpose of understanding the network's behavior and improving its performance. The basic idea is to iteratively train the network to a certain performance criterion, compute a measure of relevance that identifies which input or hidden units are most critical to performance, and automatically trim the least relevant units. This skeletonization technique can be used to simplify networks by eliminating units that convey redundant information; to improve learning performance by first learning with spare hidden units and then trimming the unnecessary ones away, thereby constraining generalization; and to understand the behavior of networks in terms of minimal "rules."
Analysis of Distributed Representation of Constituent Structure in Connectionist Systems
Smolensky, Paul
A general method, the tensor product representation, is described for the distributed representation of value/variable bindings. The method allows the fully distributed representation of symbolic structures: the roles in the structures, as well as the fillers for those roles, can be arbitrarily non-local. Fully and partially localized special cases reduce to existing cases of connectionist representations of structured data; the tensor product representation generalizes these and the few existing examples of fuUy distributed representations of structures. The representation saturates gracefully as larger structures are represented; it penn its recursive construction of complex representations from simpler ones; it respects the independence of the capacities to generate and maintain multiple bindings in parallel; it extends naturally to continuous structures and continuous representational patterns; it pennits values to also serve as variables; it enables analysis of the interference of symbolic structures stored in associative memories; and it leads to characterization of optimal distributed representations of roles and a recirculation algorithm for learning them. Introduction Any model of complex infonnation processing in networks of simple processors must solve the problem of representing complex structures over network elements. Connectionist models of realistic natural language processing, for example, must employ computationally adequate representations of complex sentences. Many connectionists feel that to develop connectionist systems with the computational power required by complex tasks, distributed representations must be used: an individual processing unit must participate in the representation of multiple items, and each item must be represented as a pattern of activity of multiple processors. Connectionist models have used more or less distributed representations of more or less complex structures, but little if any general analysis of the problem of distributed representation of complex infonnation has been carried out This paper reports results of an analysis of a general method called the tensor product representation.