feed forward network
Recent Advances in Non-convex Smoothness Conditions and Applicability to Deep Linear Neural Networks
Patel, Vivak, Varner, Christian
The presence of non-convexity in smooth optimization problems arising from deep learning have sparked new smoothness conditions in the literature and corresponding convergence analyses. We discuss these smoothness conditions, order them, provide conditions for determining whether they hold, and evaluate their applicability to training a deep linear neural network for binary classification.
Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression
Soltani, Hamidreza, Ghasemi, Erfan
Recent advancements in learned image compression (LIC) methods have demonstrated superior performance over traditional hand-crafted codecs. These learning-based methods often employ convolutional neural networks (CNNs) or Transformer-based architectures. However, these nonlinear approaches frequently overlook the frequency characteristics of images, which limits their compression efficiency. To address this issue, we propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies at the attention layer, and a Channel-aware Self-Attention (CaSA) module captures information across channels, significantly improving compression performance. Additionally, we introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information, which is crucial for effective compression. These innovations collectively improve the transformation's ability to project data into a more decorrelated latent space, thereby boosting overall compression efficiency. Experimental results demonstrate that our framework surpasses state-of-the-art LIC methods in rate-distortion performance.
The CHIR Algorithm for Feed Forward Networks with Binary Weights
A new learning algorithm, Learning by Choice of Internal Rep(cid:173) resetations (CHIR), was recently introduced. Whereas many algo(cid:173) rithms reduce the learning process to minimizing a cost function over the weights, our method treats the internal representations as the fundamental entities to be determined. The algorithm applies a search procedure in the space of internal representations, and a cooperative adaptation of the weights (e.g. by using the perceptron learning rule). Since the introduction of its basic, single output ver(cid:173) sion, the CHIR algorithm was generalized to train any feed forward network of binary neurons. Here we present the generalised version of the CHIR algorithm, and further demonstrate its versatility by describing how it can be modified in order to train networks with binary ( 1) weights.
Analytical Study of the Interplay between Architecture and Predictability
We study model feed forward networks as time series predictors in the stationary limit. The focus is on complex, yet non-chaotic, behavior. The main question we address is whether the asymptotic behavior is governed by the architecture, regardless the details of the weights . We find hierarchies among classes of architectures with respect to the attract or dimension of the long term sequence they are capable of generating; larger number of hidden units can generate higher dimensional attractors. In the case of a perceptron, we develop the stationary solution for general weights, and show that the flow is typically one dimensional.
Movement Analytics: Current Status, Application to Manufacturing, and Future Prospects from an AI Perspective
Baumgartner, Peter, Smith, Daniel, Rana, Mashud, Kapoor, Reena, Tartaglia, Elena, Schutt, Andreas, Rahman, Ashfaqur, Taylor, John, Dunstall, Simon
Data-driven decision making is becoming an integral part of manufacturing companies. Data is collected and commonly used to improve efficiency and produce high quality items for the customers. IoT-based and other forms of object tracking are an emerging tool for collecting movement data of objects/entities (e.g. human workers, moving vehicles, trolleys etc.) over space and time. Movement data can provide valuable insights like process bottlenecks, resource utilization, effective working time etc. that can be used for decision making and improving efficiency. Turning movement data into valuable information for industrial management and decision making requires analysis methods. We refer to this process as movement analytics. The purpose of this document is to review the current state of work for movement analytics both in manufacturing and more broadly. We survey relevant work from both a theoretical perspective and an application perspective. From the theoretical perspective, we put an emphasis on useful methods from two research areas: machine learning, and logic-based knowledge representation. We also review their combinations in view of movement analytics, and we discuss promising areas for future development and application. Furthermore, we touch on constraint optimization. From an application perspective, we review applications of these methods to movement analytics in a general sense and across various industries. We also describe currently available commercial off-the-shelf products for tracking in manufacturing, and we overview main concepts of digital twins and their applications.
Kformer: Knowledge Injection in Transformer Feed-Forward Layers
Yao, Yunzhi, Huang, Shaohan, Dong, Li, Wei, Furu, Chen, Huajun, Zhang, Ningyu
Recent days have witnessed a diverse set of knowledge injection models for pre-trained language models (PTMs); however, most previous studies neglect the PTMs' own ability with quantities of implicit knowledge stored in parameters. A recent study has observed knowledge neurons in the Feed Forward Network (FFN), which are responsible for expressing factual knowledge. In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers. Empirically results on two knowledge-intensive tasks, commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE), demonstrate that Kformer can yield better performance than other knowledge injection technologies such as concatenation or attention-based injection. We think the proposed simple model and empirical findings may be helpful for the community to develop more powerful knowledge injection methods. Code available in https://github.com/zjunlp/Kformer.
Transformers oversimplified
Deep learning has kept evolving throughout the years. And that is an important reason for its reputation. Deep learning practices highly emphasize the use of large buckets of parameters to extract useful information about the dataset we're dealing with. By having a large set of parameters, it becomes easier to classify/detect something as we have more data to identify distinctly. One notable milestone in the journey of Deep Learning so far, and specifically in Natural Language Processing, was the introduction of Language Models that highly improved the accuracy and efficiency of doing various NLP tasks. A sequence-sequence model is an encoder-decoder mechanism-based model that takes a sequence of inputs and returns a sequence of outputs as result.
A Self-Attention Network for Hierarchical Data Structures with an Application to Claims Management
Lรถw, Leander, Spindler, Martin, Brechmann, Eike
Insurance companies must manage millions of claims per year. While most of these claims are non-fraudulent, fraud detection is core for insurance companies. The ultimate goal is a predictive model to single out the fraudulent claims and pay out the non-fraudulent ones immediately. Modern machine learning methods are well suited for this kind of problem. Health care claims often have a data structure that is hierarchical and of variable length. We propose one model based on piecewise feed forward neural networks (deep learning) and another model based on self-attention neural networks for the task of claim management. We show that the proposed methods outperform bag-of-words based models, hand designed features, and models based on convolutional neural networks, on a data set of two million health care claims. The proposed self-attention method performs the best.
Hear and Speak Your Natural -- NLP keras โ Data Driven Investor โ Medium
The Human's are evolved about 2.3 to 2.4 million years ago. Since the 18th century, Scientists thought the great apes to be closely related to human beings. In the 19th century, They speculated that closest living relatives of humans were either chimpanzees or gorillas. Do you know what made us different from our closest living relatives? Humans have a persistent process of thinking.
Under The Hood of Neural Networks. Part 2: Recurrent.
In Part 1 of this series, we have studied the Forward and Backward passes of a Feed Forward Fully-Connected network. In spite of the fact, that Feed Forward networks are widespread and find a lot of real-world applications, they have a main limitation. Feed Forward networks cannot handle sequential data. This means that they cannot work with inputs of different sizes and they do not store information about previous states (memory). Thus, in this article, we will talk about Recurrent Neural Networks (RNNs) allowing overcome named limitations.