AITopics | in1

Collaborating Authors

in1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2f3c6a4cd8af177f6456e7e51a916ff3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 08:28:51 GMT

"Name" is the name of the operation in our search space. "TFFunction" is the TensorFlow function that the name is mapped to when a DNA instruction is being converted to a line of TensorFlow code. "Argument Mapping" describes how the values in a DNA's argument set are mapped to the corresponding TensorFlow function arguments. This vocabulary is largely constructed from the lowest level TF operations needed to create Transformers (see Appendix A.5). We also add commonly used math primitives such as SIN and ABS. Here we provide additional implementation details. Relative Dimensions: We use relative dimensions [13] instead of absolute dimensions for each instruction's "dimension size" argument. This allows us to resize the models to fit within our parameter limits (32M to 38M parameters). The vocabulary for these relative dimensions is [1, 2, 4, 8, 12, 16, 24, 32, 48, 64].

dim, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

2f3c6a4cd8af177f6456e7e51a916ff3-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 02:37:16 GMT

dim, in0, in1, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Oklahoma (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Primer: Searching for Efficient Transformers for Language Modeling

So, David R., Mańke, Wojciech, Liu, Hanxiao, Dai, Zihang, Shazeer, Noam, Le, Quoc V.

arXiv.org Artificial IntelligenceSep-17-2021

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling. Primer's improvements can be mostly attributed to two simple modifications: squaring ReLU activations and adding a depthwise convolution layer after each Q, K, and V projection in self-attention. Experiments show Primer's gains over Transformer increase as compute scale grows and follow a power law with respect to quality at optimal model sizes. We also verify empirically that Primer can be dropped into different codebases to significantly speed up training without additional tuning. For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X. Furthermore, the reduced training cost means Primer needs much less compute to reach a target one-shot performance. For instance, in a 1.9B parameter configuration similar to GPT-3 XL, Primer uses 1/3 of the training compute to achieve the same one-shot performance as Transformer. We open source our models and several comparisons in T5 to help with reproducibility.

modification, primer, transformer, (13 more...)

arXiv.org Artificial Intelligence

2109.08668

Country:

North America > United States > Oklahoma (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback