AITopics | elastic inference

Collaborating Authors

elastic inference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Chain-of-Model Learning for Language Model

Neural Information Processing SystemsJun-12-2026, 19:02:25 GMT

In this paper, we propose a novel learning paradigm, termed (CoM), which incorporates the causal relationship into the hidden states of each layer as a chain style.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

MatFormer: Nested Transformer for Elastic Inference

Neural Information Processing SystemsMay-27-2025, 22:04:46 GMT

Foundation models are applied in a broad spectrum of settings with different inference constraints, from massive multi-accelerator clusters to resource-constrained standalone mobile devices. However, the substantial costs associated with training these models often limit the number of unique model sizes that can be offered. Consequently, practitioners are compelled to select a model that may not be optimally aligned with their specific latency and cost requirements. We present MatFormer, a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints. MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model.

elastic inference, matformer, nested transformer, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

MatFormer: Nested Transformer for Elastic Inference

Devvrit, null, Kudugunta, Sneha, Kusupati, Aditya, Dettmers, Tim, Chen, Kaifeng, Dhillon, Inderjit, Tsvetkov, Yulia, Hajishirzi, Hannaneh, Kakade, Sham, Farhadi, Ali, Jain, Prateek

arXiv.org Artificial IntelligenceOct-11-2023

Transformer models are deployed in a wide range of settings, from multi-accelerator clusters to standalone mobile phones. The diverse inference constraints in these scenarios necessitate practitioners to train foundation models such as PaLM 2, Llama, & ViTs as a series of models of varying sizes. Due to significant training costs, only a select few model sizes are trained and supported, limiting more fine-grained control over relevant tradeoffs, including latency, cost, and accuracy. This work introduces MatFormer, a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints. Each Feed Forward Network (FFN) block of a MatFormer model is jointly optimized with a few nested smaller FFN blocks. This training procedure allows for the Mix'n'Match of model granularities across layers -- i.e., a trained universal MatFormer model enables extraction of hundreds of accurate smaller models, which were never explicitly optimized. We empirically demonstrate MatFormer's effectiveness across different model classes (decoders & encoders), modalities (language & vision), and scales (up to 2.6B parameters). We find that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B, each exhibiting comparable validation loss and one-shot downstream evaluations to their independently trained counterparts. Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval. Finally, we showcase that speculative decoding with the accurate and consistent submodels extracted from MatFormer can further reduce inference latency.

elastic inference, matformer, nested transformer

arXiv.org Artificial Intelligence

2310.07707

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Optimizing TensorFlow model serving with Kubernetes and Amazon Elastic Inference Amazon Web Services

#artificialintelligenceSep-6-2019, 16:52:43 GMT

The only aspect of the code that isn't straightforward is the need to enable EC2 instance termination protection while workers are processing videos, as shown in the following code example: After the job processes, a similar API call disables termination protection. This example application uses termination protection because the jobs are long-running, and you don't want an EC2 instance terminated during a scale-in event if it is still processing a video. You can easily modify the inference code and optimize it for your use case, so this post doesn't spend further time examining it. To review the Dockerfile for the inference code, see the amazon-elastic-inference-eks GitHub repo, under the /Dockerfile directory. The code itself is in the test.py

artificial intelligence, elastic inference, machine learning, (13 more...)

#artificialintelligence

Genre: Workflow (0.49)

Industry:

Retail > Online (0.40)
Information Technology > Services (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Run ONNX models with Amazon Elastic Inference Amazon Web Services

#artificialintelligenceFeb-19-2019, 19:40:23 GMT

At re:Invent 2018, AWS announced Amazon Elastic Inference (EI), a new service that lets you attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 instance. This is also available for Amazon SageMaker notebook instances and endpoints, bringing acceleration to built-in algorithms and to deep learning environments. In this blog post, I show how to use the models in the ONNX Model Zoo on GitHub to perform inference by using MXNet with Elastic Inference Accelerator (EIA) as a backend. Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75 percent. Amazon Elastic Inference provides support for Apache MXNet, TensorFlow, and ONNX models.

inference, mxnet, onnx model, (14 more...)

#artificialintelligence

Industry:

Retail > Online (0.40)
Information Technology > Services (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI Weekly: 6 important machine learning developments from AWS re:Invent

#artificialintelligenceDec-2-2018, 04:04:53 GMT

This week in Las Vegas, Amazon rolled out dozens of new features, upgrades, and new products at AWS re:Invent. Here's a quick roundup of news out of the annual conference that may matter to members of the AI community. A disproportionate amount of money is spent on inference versus training when it comes to AI models, AWS CEO Andy Jassy said, and GPUs can be terribly inefficient. To address these issues, Amazon custom-designed a chip named Inferentia due out next year and created Elastic Inference, a service that identifies parts of a neural network that can benefit from acceleration. To speed up training of AI models, Amazon introduced AWS-Optimized TensorFlow, which can train a model with the ResNet-50 benchmark in 14 minutes.

amazon, machine learning, natural language, (19 more...)

#artificialintelligence

Country:

Asia > China (0.31)
North America > United States > Nevada > Clark County > Las Vegas (0.25)

Industry: Information Technology (0.99)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.32)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.32)

Add feedback

Amazon's self-driving AI robo-car – THE TRUTH (it's a few inches in size) • The Register

#artificialintelligenceNov-29-2018, 06:42:47 GMT

It already has quite a few smart code confections: Rekognition, Lex, Polly, Transcribe, Comprehend, Translate, Sagemaker, and Greengrass, among others. At its re:Invent gathering in Las Vegas today, AWS threw a handful of new flavors into the mix, among them: Elastic Inference, SageMaker GroundTruth, SageMaker RL, Amazon SageMaker Neo, Personalize, Forecast, Textract, and Comprehend Medical. It also teased a machine-learning inference chip called Inferentia, and a small radio-controlled car called DeepRacer for executing autonomous driving models in the real-world and terrifying pets. It's a 1/18th scale race car that's ostensibly intended to help people understand and implement reinforcement learning. It may also help with customer acquisition, retention, and spending.

amazon, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Country: North America > United States > Nevada > Clark County > Las Vegas (0.25)

Industry:

Transportation > Ground > Road (0.72)
Information Technology > Robotics & Automation (0.72)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback