Can Transformers Reason Logically? A Study in SAT Solving
Pan, Leyan, Ganesh, Vijay, Abernethy, Jacob, Esposo, Chris, Lee, Wenke
–arXiv.org Artificial Intelligence
A PARAT "program" is basically a sequence of array operations over SOps. Throughout this section, we refer to the indices along the first dimension of an SOp as "position" and refer to indices along the second dimension as "dimension". The "inputs" to a program are arbitrary positional encoding and token embedding SOps, represented by the base class names PosEncSOp and TokEmbSOp respectively. For example, the OneHotTokEmb class represents the one-hot embedding of tokens and Indices represents the numerical value of the index of each position. The rest of the program performs various operations that compute new SOps based on existing ones. We provide implementations of basic building block operations including (but not limited to) the following: Mean(q, k, v) Represents the "Averaging Hard Attention" operation.
arXiv.org Artificial Intelligence
Oct-9-2024
- Country:
- North America > United States > Ohio (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Technology: