Goto

Collaborating Authors

 xtrain


Just One Layer Norm Guarantees Stable Extrapolation

Neural Information Processing Systems

In spite of their prevalence, the behaviour of Neural Networks when extrapolating far from the training distribution remains poorly understood, with existing results limited to specific cases. In this work, we prove general results--the first of their kind--by applying Neural Tangent Kernel (NTK) theory to analyse infinitelywide neural networks trained until convergence and prove that the inclusion of just one Layer Norm (LN) fundamentally alters the induced NTK, transforming it into a bounded-variance kernel. As a result, the output of an infinitely wide network with at least one LN remains bounded, even on inputs far from the training data. In contrast, we show that a broad class of networks without LN can produce pathologically large outputs for certain inputs. We support these theoretical findings with empirical experiments on finite-width networks, demonstrating that while standard NNs often exhibit uncontrolled growth outside the training domain, a single LN layer effectively mitigates this instability. Finally, we explore real-world implications of this extrapolatory stability, including applications to predicting residue sizes in proteins larger than those seen during training and estimating age from facial images of underrepresented ethnicities absent from the training set.



Few-ShotNon-ParametricLearningwithDeepLatent VariableModel

Neural Information Processing Systems

By onlytraining agenerativemodel inanunsupervised way,theframeworkutilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LVclassifies without further training.


BGeneraltrade-offs

Neural Information Processing Systems

However, we make no serious efforts to find the optimal architecture. In fact, we use the same 13 architecture for allour experiments, across the scales. Webelievethe performance onaparticular task can be further improved by carefully curating the neural architecture.




Hybrid Feature- and Similarity-Based Models for Joint Prediction and Interpretation

arXiv.org Artificial Intelligence

Electronic health records (EHRs) include simple features like patient age together with more complex data like care history that are informative but not easily represented as individual features. To better harness such data, we developed an interpretable hybrid feature- and similarity-based model for supervised learning that combines feature and kernel learning for prediction and for investigation of causal relationships. We fit our hybrid models by convex optimization with a sparsity-inducing penalty on the kernel. Depending on the desired model interpretation, the feature and kernel coefficients can be learned sequentially or simultaneously. The hybrid models showed comparable or better predictive performance than solely feature- or similarity-based approaches in a simulation study and in a case study to predict two-year risk of loneliness or social isolation with EHR data from a complex primary health care population. Using the case study we also present new kernels for high-dimensional indicator-coded EHR data that are based on deviations from population-level expectations, and we identify considerations for causal interpretations.


Mphasis

#artificialintelligence

Now that we have an understanding of Baye's Rule, let's try to use it to analyze linear regression models. Where i is the dimensionality of the data X. Yj is the corresponding output for Xj. If i 3, Yj w1* x1j w2* x2j w3* x3j Where j is ranging from 1 to N where N is the number of data points we have. While the process of Bayesian modelling will be taken up in next part, let us consider the below model as true, for now.


Hands-on Experience with Gaussian Processes (GPs): Implementing GPs in Python - I

arXiv.org Machine Learning

This document serves to complement our website which was developed with the aim of exposing the students to Gaussian Processes (GPs). GPs are non-parametric Bayesian regression models that are largely used by statisticians and geospatial data scientists for modeling spatial data. Several open source libraries spanning from Matlab [1], Python [2], R [3] etc., are already available for simple plug-and-use. The objective of this handout and in turn the website was to allow the users to develop stand-alone GPs in Python by relying on minimal external dependencies. To this end, we only use the default python modules and assist the users in developing their own GPs from scratch giving them an in-depth knowledge of what goes on under the hood. The module covers GP inference using maximum likelihood estimation (MLE) and gives examples of 1D (dummy) spatial data.