AITopics | memory transfer

Collaborating Authors

memory transfer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Depthwise Separable Convolutions on Ultra-Low-Power Devices

Daghero, Francesco, Burrello, Alessio, Poncino, Massimo, Macii, Enrico, Pagliari, Daniele Jahier

arXiv.org Artificial IntelligenceJun-18-2024

Depthwise separable convolutions are a fundamental component in efficient Deep Neural Networks, as they reduce the number of parameters and operations compared to traditional convolutions while maintaining comparable accuracy. However, their low data reuse opportunities make deploying them notoriously difficult. In this work, we perform an extensive exploration of alternatives to fuse the depthwise and pointwise kernels that constitute the separable convolutional block. Our approach aims to minimize time-consuming memory transfers by combining different data layouts. When targeting a commercial ultra-low-power device with a three-level memory hierarchy, the GreenWaves GAP8 SoC, we reduce the latency of end-to-end network execution by up to 11.40%. Furthermore, our kernels reduce activation data movements between L2 and L1 memories by up to 52.97%.

kernel, memory transfer, overhead, (12 more...)

arXiv.org Artificial Intelligence

2406.12478

Country: Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones

Risso, Matteo, Daghero, Francesco, Motetti, Beatrice Alessandra, Pagliari, Daniele Jahier, Macii, Enrico, Poncino, Massimo, Burrello, Alessio

arXiv.org Artificial IntelligenceFeb-23-2024

Miniaturized autonomous unmanned aerial vehicles (UAVs) are gaining popularity due to their small size, enabling new tasks such as indoor navigation or people monitoring. Nonetheless, their size and simple electronics pose severe challenges in implementing advanced onboard intelligence. This work proposes a new automatic optimization pipeline for visual pose estimation tasks using Deep Neural Networks (DNNs). The pipeline leverages two different Neural Architecture Search (NAS) algorithms to pursue a vast complexity-driven exploration in the DNNs' architectural space. The obtained networks are then deployed on an off-the-shelf nano-drone equipped with a parallel ultra-low power System-on-Chip leveraging a set of novel software kernels for the efficient fused execution of critical DNN layer sequences. Our results improve the state-of-the-art reducing inference latency by up to 3.22x at iso-error.

architecture, kernel, latency, (10 more...)

arXiv.org Artificial Intelligence

2402.15273

Country: Europe > Italy > Piedmont > Turin Province > Turin (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SparQ Attention: Bandwidth-Efficient LLM Inference

Ribar, Luka, Chelombiev, Ivan, Hudlass-Galley, Luke, Blake, Charlie, Luschi, Carlo, Orr, Douglas

arXiv.org Artificial IntelligenceDec-8-2023

Generative large language models (LLMs) have opened up numerous novel possibilities, but due to their significant computational requirements their ubiquitous use remains challenging. Some of the most useful applications require processing large numbers of samples at a time and using long contexts, both significantly increasing the memory communication load of the models. We introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by reducing the memory bandwidth requirements within the attention blocks through selective fetching of the cached history. Our proposed technique can be applied directly to off-the-shelf LLMs during inference, without requiring any modification to the pre-training setup or additional fine-tuning. We show how SparQ Attention can decrease the attention memory bandwidth requirements up to eight times without any loss in accuracy by evaluating Llama 2 and Pythia models on a wide range of downstream tasks.

arxiv preprint arxiv, attention score, sparq attention, (15 more...)

arXiv.org Artificial Intelligence

2312.04985

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Michigan > Marquette County (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

Wan, Zhongwei, Yin, Yichun, Zhang, Wei, Shi, Jiaxin, Shang, Lifeng, Chen, Guangyong, Jiang, Xin, Liu, Qun

arXiv.org Artificial IntelligenceDec-7-2022

Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e.g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora. However, this Domain-Adaptive Pre-Training (DAPT; Gururangan et al. (2020)) tends to forget the previous general knowledge acquired by general PLMs, which leads to a catastrophic forgetting phenomenon and sub-optimal performance. To alleviate this problem, we propose a new framework of General Memory Augmented Pre-trained Language Model (G-MAP), which augments the domain-specific PLM by a memory representation built from the frozen general PLM without losing any general knowledge. Specifically, we propose a new memory-augmented layer, and based on it, different augmented strategies are explored to build the memory representation and then adaptively fuse it into the domain-specific PLM. We demonstrate the effectiveness of G-MAP on various domains (biomedical and computer science publications, news, and reviews) and different kinds (text classification, QA, NER) of tasks, and the extensive results show that the proposed G-MAP can achieve SOTA results on all tasks.

machine learning, natural language, plm, (17 more...)

arXiv.org Artificial Intelligence

2212.03613

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)

Add feedback

Solving Machine Learning Performance Anti-Patterns: a Systematic Approach

#artificialintelligenceJul-19-2021, 11:05:47 GMT

These principles are in rough order of priority, and like all guidelines there are times they should be broken. Next we'll take a tour through some major patterns of suboptimal performance -- many of which map directly to violations of these principles. Machine learning systems show distinct patterns of resource consumption, and each of these patterns requires a different approach to improving performance. Real-world systems usually exhibit several different patterns in different parts of the inference pipeline so quite often we'll need to apply multiple of the approaches below. For example, post-processing logic is highly prone to being CPU compute bound or synchronization bound, whereas the backbone of vision models are often GPU compute bound.

gpu, machine learning performance anti-pattern, opération, (12 more...)

#artificialintelligence

Country: North America > Canada (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Object Detection from 9 FPS to 650 FPS in 6 Steps

#artificialintelligenceNov-2-2020, 07:06:58 GMT

Making code run fast on GPUs requires a very different approach to making code run fast on CPUs because the hardware architecture is fundamentally different. If you come from a background of efficient coding on CPU then you'll have to adjust some assumptions about what patterns are best. Machine learning engineers of all kinds should care about squeezing performance from their models and hardware -- not just for production purposes, but also for research and training. In research as in development, a fast iteration loop leads to faster improvement. This article is a practical deep dive into making a specific deep learning model (Nvidia's SSD300) run fast on a powerful GPU server, but the general principles apply to all GPU programming.

artificial intelligence, machine learning, memory transfer, (19 more...)

#artificialintelligence

Industry: Information Technology (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Generative Memory for Lifelong Reinforcement Learning

Raghavan, Aswin, Hostetler, Jesse, Chai, Sek

arXiv.org Artificial IntelligenceFeb-21-2019

Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1902.08349

Country: North America > United States (0.98)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting (0.39)
Government > Regional Government > North America Government > United States Government (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

Add feedback

Scientists sucked a memory out of a snail and stuck it in another snail

FOX NewsMay-15-2018, 15:00:46 GMT

Aplysia californica, also known as the California sea hare Credit: Genny Anderson/CC by 4.0 A new study strongly suggests that at least some memories are stored in genetic code, and that genetic code can act like memory soup. Suck it out of one animal and stick the code in a second animal, and that second animal can remember things that only the first animal knew. That might sound like science fiction or remind some readers of debunked ideas from decades past. But it's serious science: In a new study, researchers at the University of California, Los Angeles (UCLA) extracted RNA, a genetic messenger molecule, from one snail and implanted it in another snail. In both experiments, the recipient -- either the snail or the petri-neurons -- remembered something the donor snail had experienced.

artificial intelligence, experiment, snail, (17 more...)

FOX News

Country: North America > United States > California > Los Angeles County > Los Angeles (0.55)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.58)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback