AITopics

2411.02124

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Monaco (0.04)
Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-1-2024

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Alizadeh, Keivan, Mirzadeh, Iman, Shahrokhi, Hooman, Belenko, Dmitry, Sun, Frank, Cho, Minsik, Sekhavat, Mohammad Hossein, Nabi, Moin, Farajtabar, Mehrdad

Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models, speculative decoding, and early exit strategies leverage the insight that computational demands can vary significantly based on the complexity and nature of the input. However, identifying optimal routing patterns for dynamic execution remains an open challenge, limiting the full potential of these adaptive methods. To address this need, we study adaptive computation in LLMs more systematically. We propose a novel framework that integrates smaller auxiliary modules within each Feed-Forward Network layer of the LLM. This design enables dynamic routing of tokens based on task complexity: tokens can be processed by either the small or big modules at each layer, or even bypass certain layers entirely. This allows us to introduce a novel notion of a token's difficulty, defined by its potential to benefit from additional computational resources. Importantly, by employing oracles to identify optimal patterns of adaptive computations, we gain valuable insights into the internal workings of LLMs and the routing processes in a simplified heterogeneous MoE setup. We show that trained routers operate differently from oracles and often yield suboptimal solutions. Notably, activating a large module in just one layer outperforms models that use large modules across all layers, underscoring the gap between practical implementations of routing in MoE models and theoretical optima for adaptive computation.

artificial intelligence, large language model, natural language, (18 more...)

2410.10846

Country: Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceJun-13-2023

Scalable Adaptive Computation for Iterative Generation

Jabri, Allan, Fleet, David, Chen, Ting

Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Networks (RINs), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data. RINs focus the bulk of computation (i.e. global self-attention) on a set of latent tokens, using cross-attention to read and write (i.e. route) information between latent and data tokens. Stacking RIN blocks allows bottom-up (data to latent) and top-down (latent to data) feedback, leading to deeper and more expressive routing. While this routing introduces challenges, this is less problematic in recurrent computation settings where the task (and routing problem) changes gradually, such as iterative generation with diffusion models. We show how to leverage recurrence by conditioning the latent tokens at each forward pass of the reverse diffusion process with those from prior computation, i.e. latent self-conditioning. RINs yield state-of-the-art pixel diffusion models for image and video generation, scaling to 1024X1024 images without cascades or guidance, while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.

artificial intelligence, computation, machine learning, (15 more...)

2212.11972

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

arXiv.org Artificial IntelligenceJun-3-2023

Adaptive Computation with Elastic Input Sequence

Xue, Fuzhao, Likhosherstov, Valerii, Arnab, Anurag, Houlsby, Neil, Dehghani, Mostafa, You, Yang

Humans have the ability to adapt the type of information they use, the procedure they employ, and the amount of time they spend when solving problems. However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty. Adaptivity is a powerful paradigm as it not only imbues practitioners with flexibility pertaining to the downstream usage of these models but can also serve as a powerful inductive bias for solving certain challenging classes of problems. In this work, we introduce a new approach called AdaTape, which allows for dynamic computation in neural networks through adaptive tape tokens. AdaTape utilizes an elastic input sequence by equipping an architecture with a dynamic read-and-write tape. Specifically, we adaptively generate input sequences using tape tokens obtained from a tape bank which can be either trainable or derived from input data. We examine the challenges and requirements to obtain dynamic sequence content and length, and propose the Adaptive Tape Reading (ATR) algorithm to achieve both goals. Through extensive experiments on image recognition tasks, we show that AdaTape can achieve better performance while maintaining the computational cost. To facilitate further research, we have released code at https://github.com/google-research/scenic.

artificial intelligence, machine learning, natural language, (17 more...)

2301.13195

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

#artificialintelligenceJan-15-2023, 21:26:05 GMT

7 Must Read Books To Learn 'Machine Learning' - OpenXcell G.R. Jenkin

It offers sufficient background material on linear algebra, probability, optimization, conditional random fields, L1 regularization, deep learning and more. In the introductory chapter the book lays out different kinds of problems that can be solved by machine learning and describes the types of methods that can be used to solve them. The book progresses on to discuss these and related issues in the chapters ahead. The book uses the language of graphical models to specify models in a concise and intuitive way. Overviews of real-world applications of various techniques are provided. MATLAB and GNU octave code which implements the algorithms provided in the book can be feely downloaded from the book's website. It is not an easy read but an authoritative book intended to be used as a text book.

artificial intelligence, book review, machine learning, (12 more...)

Genre:

Summary/Review (1.00)
Instructional Material > Course Syllabus & Notes (0.72)

Industry: Education (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

#artificialintelligenceNov-27-2022, 17:45:25 GMT

7 Must Read Books To Learn 'Machine Learning' - OpenXcell

Arthur Samuel, an American pioneer in the field of computer gaming, artificial intelligence and machine learning defined Machine Learning as a "Field of study that gives computers the ability to learn without being explicitly programmed". There are computer programs that can teach themselves to grow and change when exposed to new data. Machine Learning focuses on such programs. Both search the data to look for patterns. Data mining applications extract data for human comprehension and machine learning mines that data to find out patterns.

algorithm, artificial intelligence, machine learning, (11 more...)

Genre:

Summary/Review (1.00)
Instructional Material > Course Syllabus & Notes (0.72)

Industry:

Education (0.72)
Leisure & Entertainment > Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceOct-5-2022

AdaViT: Adaptive Tokens for Efficient Vision Transformer

Yin, Hongxu, Vahdat, Arash, Alvarez, Jose, Mallya, Arun, Kautz, Jan, Molchanov, Pavlo

We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds. We reformulate Adaptive Computation Time (ACT) for this task, extending halting to discard redundant spatial tokens. The appealing architectural properties of vision transformers enables our adaptive token reduction mechanism to speed up inference without modifying the network architecture or inference hardware. We demonstrate that A-ViT requires no extra parameters or sub-network for halting, as we base the learning of adaptive halting on the original network parameters. We further introduce distributional prior regularization that stabilizes training compared to prior ACT approaches. On the image classification task (ImageNet1K), we show that our proposed A-ViT yields high efficacy in filtering informative spatial features and cutting down on the overall compute. The proposed method improves the throughput of DeiT-Tiny by 62% and DeiT-Small by 38% with only 0.3% accuracy drop, outperforming prior art by a large margin. Project page at https://a-vit.github.io/

artificial intelligence, machine learning, transformer, (19 more...)

2112.07658

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceJul-11-2022, 00:55:23 GMT

Semi-Supervised Learning (Adaptive Computation and Machine Learning series): Chapelle, Olivier, Scholkopf, Bernhard, Zien, Alexander: 9780262514125: Amazon.com: Books

In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research.Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning.

adaptive computation, computation and machine learning series, semi-supervised learning, (9 more...)

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

#artificialintelligenceJul-8-2022, 18:50:24 GMT

Amazon.com: Introduction to Machine Learning, fourth edition (Adaptive Computation and Machine Learning series) eBook : Alpaydin, Ethem: Kindle Store

The book covers a broad array of topics not usually included in introductory machine learning texts, including supervised learning, Bayesian decision theory, parametric methods, semiparametric methods, nonparametric methods, multivariate analysis, hidden Markov models, reinforcement learning, kernel machines, graphical models, Bayesian estimation, and statistical testing. The fourth edition offers a new chapter on deep learning that discusses training, regularizing, and structuring deep neural networks such as convolutional and generative adversarial networks; new material in the chapter on reinforcement learning that covers the use of deep networks, the policy gradient methods, and deep reinforcement learning; new material in the chapter on multilayer perceptrons on autoencoders and the word2vec network; and discussion of a popular method of dimensionality reduction, t-SNE. New appendixes offer background material on linear algebra and optimization. End-of-chapter exercises help readers to apply concepts learned. Introduction to Machine Learning can be used in courses for advanced undergraduate and graduate students and as a reference for professionals.

adaptive computation, computation and machine learning series, kindle store, (6 more...)

Industry: Retail > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

#artificialintelligenceDec-19-2021, 00:56:20 GMT

Deep Learning (Adaptive Computation and Machine Learning series): Goodfellow, Ian, Bengio, Yoshua, Courville, Aaron: 9780262035613: Amazon.com: Books

"Written by three experts in the field, Deep Learning is the only comprehensive book on the subject." Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.

adaptive computation, computation and machine learning series, deep learning, (8 more...)

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)