AITopics | matmul operation

Collaborating Authors

matmul operation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Matmul or No Matmul in the Era of 1-bit LLMs

Malekar, Jinendra, Elbtity, Mohammed E., Zand, Ramtin

arXiv.org Artificial IntelligenceAug-28-2024

The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl's Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.

llm, matmul operation, opération, (15 more...)

arXiv.org Artificial Intelligence

2408.11939

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Scalable MatMul-free Language Modeling

Zhu, Rui-Jie, Zhang, Yu, Sifferman, Ethan, Sheaves, Tyler, Wang, Yiqiao, Richmond, Dustin, Zhou, Peng, Eshraghian, Jason K.

arXiv.org Artificial IntelligenceJun-18-2024

Matrix multiplication (MatMul) typically dominates the overall computational cost of large language models (LLMs). This cost only grows as LLMs scale to larger embedding dimensions and context lengths. In this work, we show that MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales. Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2.7B parameters. We investigate the scaling laws and find that the performance gap between our MatMul-free models and full precision Transformers narrows as the model size increases. We also provide a GPU-efficient implementation of this model which reduces memory usage by up to 61% over an unoptimized baseline during training. By utilizing an optimized kernel during inference, our model's memory consumption can be reduced by more than 10x compared to unoptimized models. To properly quantify the efficiency of our architecture, we build a custom hardware solution on an FPGA which exploits lightweight operations beyond what GPUs are capable of. We processed billion-parameter scale models at 13W beyond human readable throughput, moving LLMs closer to brain-like efficiency. This work not only shows how far LLMs can be stripped back while still performing effectively, but also points at the types of operations future accelerators should be optimized for in processing the next generation of lightweight LLMs. Our code implementation is available at https://github.com/ridgerchu/matmulfreellm.

arxiv preprint arxiv, matmul-free lm, opération, (13 more...)

arXiv.org Artificial Intelligence

2406.02528

Country:

North America > United States > California > Yolo County > Davis (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing Inference Performance of Transformers on CPUs

Dice, Dave, Kogan, Alex

arXiv.org Artificial IntelligenceFeb-17-2021

This paper comes to address this gap by presenting an empirical analysis of scalability and performance of inferencing Transfomerbased The Transformer architecture revolutionized the field of natural models on CPUs. We identify the key component of the language processing (NLP). Transformers-based models (e.g., BERT) Transformer architecture where the bulk of the computation happens, power many important Web services, such as search, translation, namely, the matrix multiplication (matmul) operations, and question-answering, etc. While enormous research attention is paid propose three optimizations to speed them up. to the training of those models, relatively little efforts are made The first optimization is based on the observation that the performance to improve their inference performance. This paper comes to address of the matmul operation is heavily impacted not only this gap by presenting an empirical analysis of scalability by the shape (dimensions) of the source matrices and the available and performance of inferencing a Transformer-based model on computing resources (the number of worker threads), but also by CPUs.

matmul operation, matrix, opération, (13 more...)

arXiv.org Artificial Intelligence

2102.06621

Country:

North America > United States > Massachusetts > Middlesex County > Burlington (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multiply Two Matrices Using TensorFlow MatMul

@machinelearnbotMar-29-2018, 07:17:53 GMT

We start by importing TensorFlow as tf. Then we print out the version of TensorFlow that we are using. We are using TensorFlow 1.5.0. In this video, we're going to multiply two matrices by using tf.matmul operation. The first matrix will be a TensorFlow tensor shaped 3x3 with min values of 1, max values of 10, and the data type will be int32.

data type, matrix, second matrix, (8 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Understanding Tensorflow using Go

@machinelearnbotJun-1-2017, 06:45:14 GMT

Tensorflow is not a Machine Learning specific library, instead, is a general purpose computation library that represents computations with graphs. Its core is implemented in C and there are also bindings for different languages. The bindings for the Go programming language, differently from the Python ones, are a useful tool not only for using Tensorflow in Go but also for understanding how Tensorflow is implemented under the hood. Officially, the Tensorflow's developers released: Being a Gopher and not a Java lover, I started looking at the Go bindings in order to understand what kind of tasks they were created for. The first thing to note is that the Go API, for admission of the maintainers itself, lacks the Variable support: this API is designed to use trained models and not for training models from scratch.

artificial intelligence, machine learning, opération, (18 more...)

@machinelearnbot

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback