Perceptrons
Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data
Pan, Shaowu, Brunton, Steven L., Kutz, J. Nathan
High-dimensional spatio-temporal dynamics can often be encoded in a low-dimensional subspace. Engineering applications for modeling, characterization, design, and control of such large-scale systems often rely on dimensionality reduction to make solutions computationally tractable in real-time. Common existing paradigms for dimensionality reduction include linear methods, such as the singular value decomposition (SVD), and nonlinear methods, such as variants of convolutional autoencoders (CAE). However, these encoding techniques lack the ability to efficiently represent the complexity associated with spatio-temporal data, which often requires variable geometry, non-uniform grid resolution, adaptive meshing, and/or parametric dependencies. To resolve these practical engineering challenges, we propose a general framework called Neural Implicit Flow (NIF) that enables a mesh-agnostic, low-rank representation of large-scale, parametric, spatial-temporal data. NIF consists of two modified multilayer perceptrons (MLPs): (i) ShapeNet, which isolates and represents the spatial complexity, and (ii) ParameterNet, which accounts for any other input complexity, including parametric dependencies, time, and sensor measurements. We demonstrate the utility of NIF for parametric surrogate modeling, enabling the interpretable representation and compression of complex spatio-temporal dynamics, efficient many-spatial-query tasks, and improved generalization performance for sparse reconstruction.
Statistical learning theory and empirical risk
Here, I'll be giving an overview and theoretical concepts of the statistical learning. Supervised learning can play a key role in learning from examples. From this algorithm, useful information can be easily extracted from large datasets, the problem of learning from examples consecutively involves approximating functions from a sparse and noisy data. In supervised learning, network is trained on a dataset of the form, T {xk, dk} from k 1 to Q. It is observed that using MLP multilayer perceptron with sufficient number of hidden neurons, it is possible to approximate a given function to any arbitrary degree of accuracy.
Weight Decay in Multilayer Perceptrons in Deep Learning Computation
Weight decay, also known as L2 regularization, is a technique used in machine learning to prevent overfitting by adding a penalty term to the objective function that is being optimized. The goal of weight decay is to reduce the complexity of the model by limiting the size of the weights, which can help to prevent overfitting and improve the generalization ability of the model. Weight decay is typically implemented by adding a term to the objective function that is proportional to the sum of the squares of the weights. The strength of the weight decay penalty is controlled by a hyperparameter called the decay rate or regularization strength, which determines the amount of weight decay applied to the model. For example, let's say we are training a linear regression model to predict the price of a house based on the number of bedrooms and the square footage.
Smooth Mathematical Function from Compact Neural Networks
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Lu, Zhiying, Xie, Hongtao, Liu, Chuanbin, Zhang, Yongdong
There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets, which is concluded to the lack of inductive bias. In this paper, we further consider this problem and point out two weaknesses of ViTs in inductive biases, that is, the spatial relevance and diverse channel representation. First, on spatial aspect, objects are locally compact and relevant, thus fine-grained feature needs to be extracted from a token and its neighbors. While the lack of data hinders ViTs to attend the spatial relevance. Second, on channel aspect, representation exhibits diversity on different channels. But the scarce data can not enable ViTs to learn strong enough representation for accurate recognition. To this end, we propose Dynamic Hybrid Vision Transformer (DHVT) as the solution to enhance the two inductive biases. On spatial aspect, we adopt a hybrid structure, in which convolution is integrated into patch embedding and multi-layer perceptron module, forcing the model to capture the token features as well as their neighboring features. On channel aspect, we introduce a dynamic feature aggregation module in MLP and a brand new "head token" design in multi-head self-attention module to help re-calibrate channel representation and make different channel group representation interacts with each other. The fusion of weak channel representation forms a strong enough representation for classification. With this design, we successfully eliminate the performance gap between CNNs and ViTs, and our DHVT achieves a series of state-of-the-art performance with a lightweight model, 85.68% on CIFAR-100 with 22.8M parameters, 82.3% on ImageNet-1K with 24.0M parameters. Code is available at https://github.com/ArieSeirack/DHVT.
On the Equivalence of the Weighted Tsetlin Machine and the Perceptron
Sharma, Jivitesh, Granmo, Ole-Christoffer, Jiao, Lei
Tsetlin Machine (TM) has been gaining popularity as an inherently interpretable machine leaning method that is able to achieve promising performance with low computational complexity on a variety of applications. The interpretability and the low computational complexity of the TM are inherited from the Boolean expressions for representing various sub-patterns. Although possessing favorable properties, TM has not been the go-to method for AI applications, mainly due to its conceptual and theoretical differences compared with perceptrons and neural networks, which are more widely known and well understood. In this paper, we provide detailed insights for the operational concept of the TM, and try to bridge the gap in the theoretical understanding between the perceptron and the TM. More specifically, we study the operational concept of the TM following the analytical structure of perceptrons, showing the resemblance between the perceptrons and the TM. Through the analysis, we indicated that the TM's weight update can be considered as a special case of the gradient weight update. We also perform an empirical analysis of TM by showing the flexibility in determining the clause length, visualization of decision boundaries and obtaining interpretable boolean expressions from TM. In addition, we also discuss the advantages of TM in terms of its structure and its ability to solve more complex problems.
Realizing Molecular Machine Learning through Communications for Biological AI: Future Directions and Challenges
Balasubramaniam, Sasitharan, Somathilaka, Samitha, Sun, Sehee, Ratwatte, Adrian, Pierobon, Massimiliano
Artificial Intelligence (AI) and Machine Learning (ML) are weaving their way into the fabric of society, where they are playing a crucial role in numerous facets of our lives. As we witness the increased deployment of AI and ML in various types of devices, we benefit from their use into energy-efficient algorithms for low powered devices. In this paper, we investigate a scale and medium that is far smaller than conventional devices as we move towards molecular systems that can be utilized to perform machine learning functions, i.e., Molecular Machine Learning (MML). Fundamental to the operation of MML is the transport, processing, and interpretation of information propagated by molecules through chemical reactions. We begin by reviewing the current approaches that have been developed for MML, before we move towards potential new directions that rely on gene regulatory networks inside biological organisms as well as their population interactions to create neural networks. We then investigate mechanisms for training machine learning structures in biological cells based on calcium signaling and demonstrate their application to build an Analog to Digital Converter (ADC). Lastly, we look at potential future directions as well as challenges that this area could solve.
Council Post: AutoML's Rise To Prominence
The concept of machine learning first came up when Alan Turing wrote a paper about whether machines could achieve artificial intelligence. In 1957, Frank Rosenblatt designed the first neural network, called the perceptron algorithm. They are called neural networks because they are thought to be designed based on a simplistic way of how the brain works in order to process information. Though there were some initial real-world applications for machine learning, such as the Madaline network, which could eliminate phone lines' background echo, it wouldn't rise back to prominence until computer vision applications emerged in 2012. In 2012, AlexNet, a deep neural network designed by Alex Krizhevsky achieved 84% accuracy in Imagenet's image classification contest.
Membership Inference Attacks Against Latent Factor Model
The advent of the information age has led to the problems of information overload and unclear demands. As an information filtering system, personalized recommendation systems predict users' behavior and preference for items and improves users' information acquisition efficiency. However, recommendation systems usually use highly sensitive user data for training. In this paper, we use the latent factor model as the recommender to get the list of recommended items, and we representing users from relevant items Compared with the traditional member inference against machine learning classifiers. We construct a multilayer perceptron model with two hidden layers as the attack model to complete the member inference. Moreover, a shadow recommender is established to derive the labeled training data for the attack model. The attack model is trained on the dataset generated by the shadow recommender and tested on the dataset generated by the target recommender. The experimental data show that the AUC index of our attack model can reach 0.857 on the real dataset MovieLens, which shows that the attack model has good performance.
CAINNFlow: Convolutional block Attention modules and Invertible Neural Networks Flow for anomaly detection and localization tasks
Yan, Ruiqing, Zhang, Fan, Huang, Mengyuan, Liu, Wu, Hu, Dongyu, Li, Jinfeng, Liu, Qiang, Jiang, Jinrong, Guo, Qianjin, Zheng, Linghan
Detection of object anomalies is crucial in industrial processes, but unsupervised anomaly detection and localization is particularly important due to the difficulty of obtaining a large number of defective samples and the unpredictable types of anomalies in real life. Among the existing unsupervised anomaly detection and localization methods, the NF-based scheme has achieved better results. However, the two subnets (complex functions) $s_{i}(u_{i})$ and $t_{i}(u_{i})$ in NF are usually multilayer perceptrons, which need to squeeze the input visual features from 2D flattening to 1D, destroying the spatial location relationship in the feature map and losing the spatial structure information. In order to retain and effectively extract spatial structure information, we design in this study a complex function model with alternating CBAM embedded in a stacked $3\times3$ full convolution, which is able to retain and effectively extract spatial structure information in the normalized flow model. Extensive experimental results on the MVTec AD dataset show that CAINNFlow achieves advanced levels of accuracy and inference efficiency based on CNN and Transformer backbone networks as feature extractors, and CAINNFlow achieves a pixel-level AUC of $98.64\%$ for anomaly detection in MVTec AD.