Perceptrons
Why Neural Networks Work
Mukherjee, Sayandev, Huberman, Bernardo A.
We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.
Sketch-Guided Text-to-Image Diffusion Models
Voynov, Andrey, Aberman, Kfir, Cohen-Or, Daniel
Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images. In this work, we introduce a universal approach to guide a pretrained text-to-image diffusion model, with a spatial map from another domain (e.g., sketch) during inference time. Unlike previous works, our method does not require to train a dedicated model or a specialized encoder for the task. Our key idea is to train a Latent Guidance Predictor (LGP) - a small, per-pixel, Multi-Layer Perceptron (MLP) that maps latent features of noisy images to spatial maps, where the deep features are extracted from the core Denoising Diffusion Probabilistic Model (DDPM) network. The LGP is trained only on a few thousand images and constitutes a differential guiding map predictor, over which the loss is computed and propagated back to push the intermediate images to agree with the spatial map. The per-pixel training offers flexibility and locality which allows the technique to perform well on out-of-domain sketches, including free-hand style drawings. We take a particular focus on the sketch-to-image translation task, revealing a robust and expressive way to generate images that follow the guidance of a sketch of arbitrary style or domain. Project page: sketch-guided-diffusion.github.io
ActiveRMAP: Radiance Field for Active Mapping And Planning
Zhan, Huangying, Zheng, Jiyang, Xu, Yi, Reid, Ian, Rezatofighi, Hamid
A high-quality 3D reconstruction of a scene from a collection of 2D images can be achieved through offline/online mapping methods. In this paper, we explore active mapping from the perspective of implicit representations, which have recently produced compelling results in a variety of applications. One of the most popular implicit representations - Neural Radiance Field (NeRF), first demonstrated photorealistic rendering results using multi-layer perceptrons, with promising offline 3D reconstruction as a by-product of the radiance field. More recently, researchers also applied this implicit representation for online reconstruction and localization (i.e. implicit SLAM systems). However, the study on using implicit representation for active vision tasks is still very limited. In this paper, we are particularly interested in applying the neural radiance field for active mapping and planning problems, which are closely coupled tasks in an active system. We, for the first time, present an RGB-only active vision framework using radiance field representation for active 3D reconstruction and planning in an online manner. Specifically, we formulate this joint task as an iterative dual-stage optimization problem, where we alternatively optimize for the radiance field representation and path planning. Experimental results suggest that the proposed method achieves competitive results compared to other offline methods and outperforms active reconstruction methods using NeRFs.
KiloNeuS: A Versatile Neural Implicit Surface Representation for Real-Time Rendering
Esposito, Stefano, Baieri, Daniele, Zellmann, Stefan, Hinkenjann, Andrรฉ, Rodolร , Emanuele
NeRF-based techniques fit wide and deep multi-layer perceptrons (MLPs) to a continuous radiance field that can be rendered from any unseen viewpoint. However, the lack of surface and normals definition and high rendering times limit their usage in typical computer graphics applications. Such limitations have recently been overcome separately, but solving them together remains an open problem. We present KiloNeuS, a neural representation reconstructing an implicit surface represented as a signed distance function (SDF) from multi-view images and enabling real-time rendering by partitioning the space into thousands of tiny MLPs fast to inference. As we learn the implicit surface locally using independent models, resulting in a globally coherent geometry is non-trivial and needs to be addressed during training. We evaluate rendering performance on a GPU-accelerated ray-caster with in-shader neural network inference, resulting in an average of 46 FPS at high resolution, proving a satisfying tradeoff between storage costs and rendering quality. In fact, our evaluation for rendering quality and surface recovery shows that KiloNeuS outperforms its single-MLP counterpart. Finally, to exhibit the versatility of KiloNeuS, we integrate it into an interactive path-tracer taking full advantage of its surface normals. We consider our work a crucial first step toward real-time rendering of implicit neural representations under global illumination.
Improving the Robustness of Neural Multiplication Units with Reversible Stochasticity
Mistry, Bhumika, Farrahi, Katayoun, Hare, Jonathon
Multilayer Perceptrons struggle to learn certain simple arithmetic tasks. Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range. In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges. Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima. A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution. Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks.
Perceptron: AI that sees with sound, learns to walk and predicts seismic physics
Research in the field of machine learning and AI, now a key technology in practically every industry and company, is far too voluminous for anyone to read it all. This column, Perceptron, aims to collect some of the most relevant recent discoveries and papers -- particularly in, but not limited to, artificial intelligence -- and explain why they matter. This month, engineers at Meta detailed two recent innovations from the depths of the company's research labs: an AI system that compresses audio files and an algorithm that can accelerate protein-folding AI performance by 60x. Elsewhere, scientists at MIT revealed that they're using spatial acoustic information to help machines better envision their environments, simulating how a listener would hear a sound from any point in a room. Meta's compression work doesn't exactly reach unexplored territory. Last year, Google announced Lyra, a neural audio codec trained to compress low-bitrate speech.
Comparison of Data Representations and Machine Learning Architectures for User Identification on Arbitrary Motion Sequences
Schell, Christian, Hotho, Andreas, Latoschik, Marc Erich
Reliable and robust user identification and authentication are important and often necessary requirements for many digital services. It becomes paramount in social virtual reality (VR) to ensure trust, specifically in digital encounters with lifelike realistic-looking avatars as faithful replications of real persons. Recent research has shown that the movements of users in extended reality (XR) systems carry user-specific information and can thus be used to verify their identities. This article compares three different potential encodings of the motion data from head and hands (scene-relative, body-relative, and body-relative velocities), and the performances of five different machine learning architectures (random forest, multi-layer perceptron, fully recurrent neural network, long-short term memory, gated recurrent unit). We use the publicly available dataset "Talking with Hands" and publish all code to allow reproducibility and to provide baselines for future work. After hyperparameter optimization, the combination of a long-short term memory architecture and body-relative data outperformed competing combinations: the model correctly identifies any of the 34 subjects with an accuracy of 100% within 150 seconds. Altogether, our approach provides an effective foundation for behaviometric-based identification and authentication to guide researchers and practitioners. Data and code are published under https://go.uniwue.de/58w1r.
Multilayer Perceptron Network Discriminates Larval Zebrafish Genotype using Behaviour
Fusco, Christopher, Allen, Angel
Zebrafish are a common model organism used to identify new disease therapeutics. High-throughput drug screens can be performed on larval zebrafish in multi-well plates by observing changes in behaviour following a treatment. Analysis of this behaviour can be difficult, however, due to the high dimensionality of the data obtained. Statistical analysis of individual statistics (such as the distance travelled) is generally not powerful enough to detect meaningful differences between treatment groups. Here, we propose a method for classifying zebrafish models of Parkinson's disease by genotype at 5 days old. Using a set of 2D behavioural features, we train a multi-layer perceptron neural network. We further show that the use of integrated gradients can give insight into the impact of each behaviour feature on genotype classifications by the model. In this way, we provide a novel pipeline for classifying zebrafish larvae, beginning with feature preparation and ending with an impact analysis of said features.
Hierarchical Automatic Power Plane Generation with Genetic Optimization and Multilayer Perceptron
Liao, Haiguang, Patil, Vinay, Dong, Xuliang, Shanbhag, Devika, Fallon, Elias, Hogan, Taylor, Spasojevic, Mirko, Kara, Levent Burak
We present an automatic multilayer power plane generation method to accelerate the design of printed circuit boards (PCB). In PCB design, while automatic solvers have been developed to predict important indicators such as the IR-drop, power integrity, and signal integrity, the generation of the power plane itself still largely relies on laborious manual methods. Our automatic power plane generation approach is based on genetic optimization combined with a multilayer perceptron and is able to automatically generate power planes across a diverse set of problems with varying levels of difficulty. Our method GOMLP consists of an outer loop genetic optimizer (GO) and an inner loop multi-layer perceptron (MLP) that generate power planes automatically. The critical elements of our approach include contour detection, feature expansion, and a distance measure to enable island-minimizing complex power plane generation. We compare our approach to a baseline solution based on A*. The A* method consisting of a sequential island generation and merging process which can produce less than ideal solutions. Our experimental results show that on single layer power plane problems, our method outperforms A* in 71% of the problems with varying levels of board layout difficulty. We further describe H-GOMLP, which extends GOMLP to multilayer power plane problems using hierarchical clustering and net similarities based on the Hausdorff distance.
Improving Multilayer-Perceptron(MLP)-based Network Anomaly Detection with Birch Clustering on CICIDS-2017 Dataset
Yin, Yuhua, Jang-Jaccard, Julian, Sabrina, Fariza, Kwak, Jin
Machine learning algorithms have been widely used in intrusion detection systems, including Multi-layer Perceptron (MLP). In this study, we proposed a two-stage model that combines the Birch clustering algorithm and MLP classifier to improve the performance of network anomaly multi-classification. In our proposed method, we first apply Birch or Kmeans as an unsupervised clustering algorithm to the CICIDS-2017 dataset to pre-group the data. The generated pseudo-label is then added as an additional feature to the training of the MLP-based classifier. The experimental results show that using Birch and K-Means clustering for data pre-grouping can improve intrusion detection system performance. Our method can achieve 99.73% accuracy in multi-classification using Birch clustering, which is better than similar researches using a stand-alone MLP model.