Goto

Collaborating Authors

 Overview


Full Stack Optimization of Transformer Inference: a Survey

arXiv.org Artificial Intelligence

Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate, and this has made their deployment in latency-sensitive applications challenging. As such, there has been an increased focus on making Transformer models more efficient, with methods that range from changing the architecture design, all the way to developing dedicated domain-specific accelerators. In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search. Finally, we perform a case study by applying the surveyed optimizations on Gemmini, the open-source, full-stack DNN accelerator generator, and we show how each of these approaches can yield improvements, compared to previous benchmark results on Gemmini. Among other things, we find that a full-stack co-design approach with the aforementioned methods can result in up to 88.7x speedup with a minimal performance degradation for Transformer inference.


Attention is All you Need. Unveiling the Science Behind ChatGPT

#artificialintelligence

This article provides an overview of the ChatGPT language model, which has made significant contributions to the field of natural language processing. We discuss the limitations of traditional neural network architectures and introduce the transformer architecture, which uses self-attention mechanisms to handle long-term dependencies and variable-length inputs. We explain the key mechanisms behind ChatGPT, including attention, scale dot-product attention, multi-head attention, position-wise feed-forward networks, embeddings, softmax, and positional encoding. We also discuss the applications of attention and the importance of training, including training data and batching, hardware and schedule, optimizer, and regularization. Finally, we present the results of ChatGPT in various tasks, such as machine translation and model variations, demonstrating its potential to revolutionize the field of NLP.


Double Matching Under Complementary Preferences

arXiv.org Artificial Intelligence

In this paper, we propose a new algorithm for addressing the problem of matching markets with complementary preferences, where agents' preferences are unknown a priori and must be learned from data. The presence of complementary preferences can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a double matching technique to achieve a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it is able to achieve stability at every matching step, satisfies the incentive-compatibility property, and has a sublinear Bayesian regret over time. Our approach provides a useful method for addressing complementary preferences in real-world scenarios.


Placental Vessel Segmentation and Registration in Fetoscopy: Literature Review and MICCAI FetReg2021 Challenge Findings

arXiv.org Artificial Intelligence

Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to regulate blood exchange among twins. The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. These challenges may lead to increased surgery time and incomplete ablation. Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Research in this domain has been hampered by the lack of high-quality data to design, develop and test CAI algorithms. Through the Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge, which was organized as part of the MICCAI2021 Endoscopic Vision challenge, we released the first largescale multicentre TTTS dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms. For this challenge, we released a dataset of 2060 images, pixel-annotated for vessels, tool, fetus and background classes, from 18 in-vivo TTTS fetoscopy procedures and 18 short video clips. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fetoscopic procedures and 6 short clips. The challenge provided an opportunity for creating generalized solutions for fetoscopic scene understanding and mosaicking. In this paper, we present the findings of the FetReg2021 challenge alongside reporting a detailed literature review for CAI in TTTS fetoscopy. Through this challenge, its analysis and the release of multi-centre fetoscopic data, we provide a benchmark for future research in this field.


A Survey on Learnable Evolutionary Algorithms for Scalable Multiobjective Optimization

arXiv.org Artificial Intelligence

Recent decades have witnessed great advancements in multiobjective evolutionary algorithms (MOEAs) for multiobjective optimization problems (MOPs). However, these progressively improved MOEAs have not necessarily been equipped with scalable and learnable problem-solving strategies for new and grand challenges brought by the scaling-up MOPs with continuously increasing complexity from diverse aspects, mainly including expensive cost of function evaluations, many objectives, large-scale search space, time-varying environments, and multi-task. Under different scenarios, divergent thinking is required in designing new powerful MOEAs for solving them effectively. In this context, research studies on learnable MOEAs with machine learning techniques have received extensive attention in the field of evolutionary computation. This paper begins with a general taxonomy of scaling-up MOPs and learnable MOEAs, followed by an analysis of the challenges that these MOPs pose to traditional MOEAs. Then, we synthetically overview recent advances of learnable MOEAs in solving various scaling-up MOPs, focusing primarily on four attractive directions (i.e., learnable evolutionary discriminators for environmental selection, learnable evolutionary generators for reproduction, learnable evolutionary evaluators for function evaluations, and learnable evolutionary transfer modules for sharing or reusing optimization experience). The insight of learnable MOEAs is offered to readers as a reference to the general track of the efforts in this field.


Changes in Commuter Behavior from COVID-19 Lockdowns in the Atlanta Metropolitan Area

arXiv.org Artificial Intelligence

This paper analyzes the impact of COVID-19 related lockdowns in the Atlanta, Georgia metropolitan area by examining commuter patterns in three periods: prior to, during, and after the pandemic lockdown. A cellular phone location dataset is utilized in a novel pipeline to infer the home and work locations of thousands of users from the Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The coordinates derived from the clustering are put through a reverse geocoding process from which word embeddings are extracted in order to categorize the industry of each work place based on the workplace name and Point of Interest (POI) mapping. Frequencies of commute from home locations to work locations are analyzed in and across all three time periods. Public health and economic factors are discussed to explain potential reasons for the observed changes in commuter patterns.


Towards Interpretable Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) enables multiple data owners to build machine learning models collaboratively without exposing their private local data. In order for FL to achieve widespread adoption, it is important to balance the need for performance, privacy-preservation and interpretability, especially in mission critical applications such as finance and healthcare. Thus, interpretable federated learning (IFL) has become an emerging topic of research attracting significant interest from the academia and the industry alike. Its interdisciplinary nature can be challenging for new researchers to pick up. In this paper, we bridge this gap by providing (to the best of our knowledge) the first survey on IFL. We propose a unique IFL taxonomy which covers relevant works enabling FL models to explain the prediction results, support model debugging, and provide insights into the contributions made by individual data owners or data samples, which in turn, is crucial for allocating rewards fairly to motivate active and reliable participation in FL. We conduct comprehensive analysis of the representative IFL approaches, the commonly adopted performance evaluation metrics, and promising directions towards building versatile IFL techniques.


Principled and Efficient Transfer Learning of Deep Models via Neural Collapse

arXiv.org Artificial Intelligence

As model size continues to grow and access to labeled training data remains limited, transfer learning has become a popular approach in many scientific and engineering fields. This study explores the phenomenon of neural collapse (NC) in transfer learning for classification problems, which is characterized by the last-layer features and classifiers of deep networks having zero within-class variability in features and maximally and equally separated between-class feature means. Through the lens of NC, in this work the following findings on transfer learning are discovered: (i) preventing within-class variability collapse to a certain extent during model pre-training on source data leads to better transferability, as it preserves the intrinsic structures of the input data better; (ii) obtaining features with more NC on downstream data during fine-tuning results in better test accuracy. These results provide new insight into commonly used heuristics in model pre-training, such as loss design, data augmentation, and projection heads, and lead to more efficient and principled methods for fine-tuning large pre-trained models. Compared to full model fine-tuning, our proposed fine-tuning methods achieve comparable or even better performance while reducing fine-tuning parameters by at least 70% as well as alleviating overfitting.


Resources for Turkish Natural Language Processing: A critical survey

arXiv.org Artificial Intelligence

The recent (re)popularization of deep learning methods increased the importance and need for the data even further. Similarly, the other subfields of theoretical and applied linguistics have also seen a shift towards more data-driven methods. As a result, availability of large and high-quality language data is essential for both linguistic research and practical NLP applications. In this paper, we present a comprehensive and critical survey of linguistic resources for Turkish.


Example Forgetting: A Novel Approach to Explain and Interpret Deep Neural Networks in Seismic Interpretation

arXiv.org Artificial Intelligence

In recent years, deep neural networks have significantly impacted the seismic interpretation process. Due to the simple implementation and low interpretation costs, deep neural networks are an attractive component for the common interpretation pipeline. However, neural networks are frequently met with distrust due to their property of producing semantically incorrect outputs when exposed to sections the model was not trained on. We address this issue by explaining model behaviour and improving generalization properties through example forgetting: First, we introduce a method that effectively relates semantically malfunctioned predictions to their respectful positions within the neural network representation manifold. More concrete, our method tracks how models "forget" seismic reflections during training and establishes a connection to the decision boundary proximity of the target class. Second, we use our analysis technique to identify frequently forgotten regions within the training volume and augment the training set with state-of-the-art style transfer techniques from computer vision. We show that our method improves the segmentation performance on underrepresented classes while significantly reducing the forgotten regions in the F3 volume in the Netherlands.