AITopics | Smolensky, Paul

Plotting

Smolensky, Paul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

Soulos, Paul, Conklin, Henry, Opper, Mattia, Smolensky, Paul, Gao, Jianfeng, Fernandez, Roland

arXiv.org Artificial IntelligenceDec-18-2024

Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is \textit{hybrid} neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flexibility. The reason for this failure is that at their core, hybrid neurosymbolic models perform symbolic computation and relegate the scalable and flexible neural computation to parameterizing a symbolic system. We investigate a \textit{unified} neurosymbolic system where transformations in the network can be interpreted simultaneously as both symbolic and neural computation. We extend a unified neurosymbolic architecture called the Differentiable Tree Machine in two central ways. First, we significantly increase the model's efficiency through the use of sparse vector representations of symbolic structures. Second, we enable its application beyond the restricted set of tree2tree problems to the more general class of seq2seq problems. The improved model retains its prior generalization capabilities and, since there is a fully neural path through the network, avoids the pitfalls of other neurosymbolic techniques that elevate symbolic computation over neural computation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.14076

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mechanisms of Symbol Processing for In-Context Learning in Transformer Networks

Smolensky, Paul, Fernandez, Roland, Zhou, Zhenghao Herbert, Opper, Mattia, Gao, Jianfeng

arXiv.org Artificial IntelligenceOct-22-2024

Large Language Models (LLMs) have demonstrated impressive abilities in symbol processing through in-context learning (ICL). This success flies in the face of decades of predictions that artificial neural networks cannot master abstract symbol manipulation. We seek to understand the mechanisms that can enable robust symbol processing in transformer networks, illuminating both the unanticipated success, and the significant limitations, of transformers in symbol processing. Borrowing insights from symbolic AI on the power of Production System architectures, we develop a high-level language, PSL, that allows us to write symbolic programs to do complex, abstract symbol processing, and create compilers that precisely implement PSL programs in transformer networks which are, by construction, 100% mechanistically interpretable. We demonstrate that PSL is Turing Universal, so the work can inform the understanding of transformer ICL in general. The type of transformer architecture that we compile from PSL programs suggests a number of paths for enhancing transformers' capabilities at symbol processing. (Note: The first section of the paper gives an extended synopsis of the entire paper.)

in-context learning, machine learning, transformer network, (5 more...)

arXiv.org Artificial Intelligence

2410.17498

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback

Implicit Chain of Thought Reasoning via Knowledge Distillation

Deng, Yuntian, Prasad, Kiran, Fernandez, Roland, Smolensky, Paul, Chaudhary, Vishrav, Shieber, Stuart

arXiv.org Artificial IntelligenceNov-2-2023

To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternative reasoning approach: instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning. The implicit reasoning steps are distilled from a teacher model trained on explicit chain-of-thought reasoning, and instead of doing reasoning "horizontally" by producing intermediate words one-by-one, we distill it such that the reasoning happens "vertically" among the hidden states in different layers. We conduct experiments on a multi-digit multiplication task and a grade school math problem dataset and find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought. To elicit their reasoning abilities, a prevalent paradigm has been the chainof-thought reasoning approach (Nye et al., 2021; Wei et al., 2022b; Kojima et al., 2022). Under this paradigm, models are trained or prompted to articulate intermediate steps before producing the final answer. Although this approach aligns with human problem-solving strategies, it might not fully leverage the computational potential of these language models. Consider the transformer architecture (Vaswani et al., 2017), which can manifest computation both "horizontally" by generating words in sequence and "vertically" by processing through its many layers of internal hidden states. With models like GPT-3 having as many as 96 layers (Brown et al., 2020), one might wonder: Why not let these models reason internally, "vertically" through their layers, and present the solution without necessarily articulating every intermediate step? Such an approach would not only save the significant time cost of autoregressively generating the chain-of-thought: it may also allow models to develop more efficient, if less human-interpretable, methods of reasoning, unconstrained by human conventions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.0146

Country:

North America > United States > Texas (0.14)
North America > Canada (0.14)
Europe > Portugal (0.14)
(2 more...)

Genre: Research Report (0.82)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Differentiable Tree Operations Promote Compositional Generalization

Soulos, Paul, Hu, Edward, McCurdy, Kate, Chen, Yunmo, Fernandez, Roland, Smolensky, Paul, Gao, Jianfeng

arXiv.org Artificial IntelligenceJun-1-2023

In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Differentiable Tree Machine (DTM) architecture that integrates our interpreter with an external memory and an agent that learns to sequentially select tree operations to execute the target transformation in an end-to-end manner. With respect to out-of-distribution compositional generalization on synthetic semantic parsing and language generation tasks, DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%. DTM remains highly interpretable in addition to its perfect performance.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.00751

Country: North America > United States (0.68)

Genre:

Workflow (0.48)
Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models

Kim, Najoung, Linzen, Tal, Smolensky, Paul

arXiv.org Artificial IntelligenceDec-21-2022

Human linguistic capacity is often characterized by compositionality and the generalization it enables -- human learners can produce and comprehend novel complex expressions by composing known parts. Several benchmarks exploit distributional control across training and test to gauge compositional generalization, where certain lexical items only occur in limited contexts during training. While recent work using these benchmarks suggests that pretrained models achieve impressive generalization performance, we argue that exposure to pretraining data may break the aforementioned distributional control. Using the COGS benchmark of Kim and Linzen (2020), we test two modified evaluation setups that control for this issue: (1) substituting context-controlled lexical items with novel character sequences, and (2) substituting them with special tokens represented by novel embeddings. We find that both of these setups lead to lower generalization performance in T5 (Raffel et al., 2020), suggesting that previously reported results have been overestimated due to uncontrolled lexical exposure during pretraining. The performance degradation is more extreme with novel embeddings, and the degradation increases with the amount of pretraining data, highlighting an interesting case of inverse scaling.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.10769

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)

Add feedback

Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization

Jiang, Yichen, Celikyilmaz, Asli, Smolensky, Paul, Soulos, Paul, Rao, Sudha, Palangi, Hamid, Fernandez, Roland, Smith, Caitlin, Bansal, Mohit, Gao, Jianfeng

arXiv.org Artificial IntelligenceJun-2-2021

Abstractive summarization, the task of generating a concise summary of input documents, requires: (1) reasoning over the source document to determine the salient pieces of information scattered across the long document, and (2) composing a cohesive text by reconstructing these salient facts into a shorter summary that faithfully reflects the complex relations connecting these facts. In this paper, we adapt TP-TRANSFORMER (Schlag et al., 2019), an architecture that enriches the original Transformer (Vaswani et al., 2017) with the explicitly compositional Tensor Product Representation (TPR), for the task of abstractive summarization. The key feature of our model is a structural bias that we introduce by encoding two separate representations for each token to represent the syntactic structure (with role vectors) and semantic content (with filler vectors) separately. The model then binds the role and filler vectors into the TPR as the layer output. We argue that the structured intermediate representations enable the model to take better control of the contents (salient facts) and structures (the syntax that connects the facts) when generating the summary. Empirically, we show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets based on both automatic and human evaluations. On several syntactic and semantic probing tasks, we demonstrate the emergent structural information in the role vectors and improved syntactic interpretability in the TPR layer outputs. Code and models are available at https://github.com/jiangycTarheel/TPT-Summ.

deep learning, law enforcement, representation, (22 more...)

arXiv.org Artificial Intelligence

2106.01317

Country:

Europe > United Kingdom (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (1.00)

Industry:

Government > Military (0.93)
Law Enforcement & Public Safety (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Compositional Processing Emerges in Neural Networks Solving Math Problems

Russin, Jacob, Fernandez, Roland, Palangi, Hamid, Rosen, Eric, Jojic, Nebojsa, Smolensky, Paul, Gao, Jianfeng

arXiv.org Artificial IntelligenceMay-19-2021

A longstanding question in cognitive science concerns the learning mechanisms underlying compositionality in human cognition. Humans can infer the structured relationships (e.g., grammatical rules) implicit in their sensory observations (e.g., auditory speech), and use this knowledge to guide the composition of simpler meanings into complex wholes. Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings (e.g., the quantities corresponding to numerals) should be composed according to structured rules (e.g., order of operations). Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.

neural network, survey article, vector, (20 more...)

arXiv.org Artificial Intelligence

2105.08961

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language

Akbari, Hassan, Palangi, Hamid, Yang, Jianwei, Rao, Sudha, Celikyilmaz, Asli, Fernandez, Roland, Smolensky, Paul, Gao, Jianfeng, Chang, Shih-Fu

arXiv.org Artificial IntelligenceNov-18-2020

Neuro-symbolic representations have proved effective in learning structure information in vision and language. In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions. We refer to these relations as relative roles and leverage them to make each token role-aware using attention. This results in a more structured and interpretable architecture that incorporates modality-specific inductive biases for the captioning task. Intuitively, the model is able to learn spatial, temporal, and cross-modal relations in a given pair of video and text. The disentanglement achieved by our proposal gives the model more capacity to capture multi-modal structures which result in captions with higher quality for videos. Our experiments on two established video captioning datasets verifies the effectiveness of the proposed approach based on automatic metrics. We further conduct a human evaluation to measure the grounding and relevance of the generated captions and observe consistent improvement for the proposed model. The codes and trained models can be found at https://github.com/hassanhub/R3Transformer

deep learning, neural network, representation, (19 more...)

arXiv.org Artificial Intelligence

2011.0953

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

HUBERT Untangles BERT to Improve Transfer across NLP Tasks

Moradshahi, Mehrad, Palangi, Hamid, Lam, Monica S., Smolensky, Paul, Gao, Jianfeng

arXiv.org Machine LearningOct-25-2019

We show that there is shared structure between different NLP datasets that HUBERT, but not BERT, is able to learn and leverage. Our experiment results show that untangling data-specific semantics from general language structure is key for better transfer among NLP tasks. Built on the Transformer architecture (V aswani et al., 2017), the BERT model (Devlin et al., 2018) has demonstrated great power for providing general-purpose vector embeddings of natural language: its representations have served as the basis of many successful deep Natural Language Processing (NLP) models on a variety of tasks (e.g., Liu et al., 2019a;b; Zhang et al., 2019). Recent studies (Coenen et al., 2019; Hewitt & Manning, 2019; Lin et al., 2019; Tenney et al., 2019) have shown that BERT representations carry considerable information about grammatical structure, which, by design, is a deep and general encapsulation of linguistic information. Symbolic computation over structured symbolic representations such as parse trees has long been used to formalize linguistic knowledge. To strengthen the generality of BERT's representations, we propose to import into its architecture this type of computation. Symbolic linguistic representations support the important distinction between content and form information. The form consists of a structure devoid of content, such as an unlabeled tree, a collection of nodes defined by their structural positions or roles (Newell, 1980), such as root, left-child-of-root, right-child-of-left-child-of root, etc. In a particular linguistic expression such as "Kim referred to herself during the speech", these purely-structural roles are filled with particular content-bearing symbols, including terminal words like Kim and non-terminal categories like NounPhrase . These role fillers have their own identities, which are preserved as they move from role to role across expressions: Kim retains its referent and its semantic properties whether it fills the subject or the object role in a sentence.

deep learning, neural network, representation, (21 more...)

arXiv.org Machine Learning

1910.12647

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving

Schlag, Imanol, Smolensky, Paul, Fernandez, Roland, Jojic, Nebojsa, Schmidhuber, Jürgen, Gao, Jianfeng

arXiv.org Machine LearningOct-15-2019

A BSTRACT We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformer's attention maps give better insights into how it is capable of solving the Mathematics Dataset's challenging problems. Pretrained models and code will be made available after publication. 1 I NTRODUCTION In this paper we propose a variation of the Transformer (V aswani et al., 2017) that is designed to allow it to better incorporate structure into its representations. We test the proposal on a task where structured representations are expected to be particularly helpful: math word-problem solving, where, among other things, correctly parsing expressions and compositionally evaluating them is crucial.

deep learning, neural network, representation, (19 more...)

arXiv.org Machine Learning

1910.06611

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)

Add feedback