decision surface
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models
Chen, Ryan, Ko, Youngmin, Zhang, Zeyu, Cho, Catherine, Chung, Sunny, Giuffré, Mauro, Shung, Dennis L., Stadie, Bradly C.
We introduce LAMP (Linear Attribution Mapping Probe), a method that shines light onto a black-box language model's decision surface and studies how reliably a model maps its stated reasons to its predictions through a locally linear model approximating the decision surface. LAMP treats the model's own self-reported explanations as a coordinate system and fits a locally linear surrogate that links those weights to the model's output. By doing so, it reveals which stated factors steer the model's decisions, and by how much. We apply LAMP to three tasks: sentiment analysis, controversial-topic detection, and safety-prompt auditing. Across these tasks, LAMP reveals that many LLMs exhibit locally linear decision landscapes. In addition, these surfaces correlate with human judgments on explanation quality and, on a clinical case-file data set, aligns with expert assessments. Since LAMP operates without requiring access to model gradients, logits, or internal activations, it serves as a practical and lightweight framework for auditing proprietary language models, and enabling assessment of whether a model behaves consistently with the explanations it provides.
O$n$ Learning Deep O($n$)-Equivariant Hyperspheres
Melnyk, Pavlo, Felsberg, Michael, Wadenbäck, Mårten, Robinson, Andreas, Le, Cuong
In this paper, we utilize hyperspheres and regular n-The orthogonal group O(n) fully encapsulates the symmetry simplexes and propose an approach to learning deep features structure of an nD sphere, including both rotational equivariant under the transformations of nD reflections and reflection symmetries. Integrating these symmetries and rotations, encompassed by the powerful group into a model as an inductive bias is often a crucial requirement of O(n). Namely, we propose O(n)-equivariant neurons for problems in natural sciences and the respective with spherical decision surfaces that generalize to applications, e.g., molecular analysis, protein design and any dimension n, which we call Deep Equivariant assessment, or catalyst design (Rupp et al., 2012; Ramakrishnan Hyperspheres. We demonstrate how to combine them et al., 2014; Townshend et al., 2021; Jing et al., 2021; in a network that directly operates on the basis of the input Lan et al., 2022).
ANALYTiC: Understanding Decision Boundaries and Dimensionality Reduction in Machine Learning
The advent of compact, handheld devices has given us a pool of tracked movement data that could be used to infer trends and patterns that can be made to use. With this flooding of various trajectory data of animals, humans, vehicles, etc., the idea of ANALYTiC originated, using active learning to infer semantic annotations from the trajectories by learning from sets of labeled data. This study explores the application of dimensionality reduction and decision boundaries in combination with the already present active learning, highlighting patterns and clusters in data. We test these features with three different trajectory datasets with objective of exploiting the the already labeled data and enhance their interpretability. Our experimental analysis exemplifies the potential of these combined methodologies in improving the efficiency and accuracy of trajectory labeling. This study serves as a stepping-stone towards the broader integration of machine learning and visual methods in context of movement data analysis.
How adversarial attacks can disrupt seemingly stable accurate classifiers
Sutton, Oliver J., Zhou, Qinghua, Tyukin, Ivan Y., Gorban, Alexander N., Bastounis, Alexander, Higham, Desmond J.
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
How AI Neural Networks Show That the Mind Is Not the Brain
Recently, I've been arguing (here and here, for example) that we can use artificial neural networks (ANNs) to prove that the mind is not the brain. This means if we can conclusively show the human mind can learn better than a neural network, then the mind is not the brain. For Premise A, I've argued that the differentiable neural network is a superior learning model compared to the brain neuron's "all or nothing principle". The neural network has a "hot" or "cold" signal that it can learn from iteratively, whereas the neuron has a binary "yes" or "no" signal that does not allow for gradual improvement, making learning impossible for brain neurons. This brings us to Premise B, where I will show that, nonetheless, the human mind can learn better than a neural network.
Dealing with sparse categorical variables in predictive modeling
One of the biggest challenges a data scientist must deal with is to find an efficient way to numerically encode qualitative features. Indeed, only numerical representation of categorical variables can be used as input of predictive models. The most known method is called one-hot encoding, and it works by creating dummy variables. Hence, if a qualitative column has n modalities, n columns will be added to the dataset. Even if most of the time dummy encoding is an effective and flexible way to reach a good performance, there are situations in which it would be needed to explore other methods, like "Frequency encoding" and "Target encoding".
Plot a Decision Surface for Machine Learning Algorithms in Python
Classification algorithms learn how to assign class labels to examples, although their decisions can appear opaque. A popular diagnostic for understanding the decisions made by a classification algorithm is the decision surface. This is a plot that shows how a fit machine learning algorithm predicts a coarse grid across the input feature space. A decision surface plot is a powerful tool for understanding how a given model "sees" the prediction task and how it has decided to divide the input feature space by class label. In this tutorial, you will discover how to plot a decision surface for a classification machine learning algorithm.
Embed Me If You Can: A Geometric Perceptron
Melnyk, Pavlo, Felsberg, Michael, Wadenbäck, Mårten
Solving geometric tasks using machine learning is a challenging problem. Standard feed-forward neural networks combine linear or, if the bias parameter is included, affine layers and activation functions. Their geometric modeling is limited, which is why we introduce the alternative model of the multilayer geometric perceptron (MLGP) with units that are geometric neurons, i.e., combinations of hypersphere neurons. The hypersphere neuron is obtained by applying a conformal embedding of Euclidean space. By virtue of Clifford algebra, it can be implemented as the Cartesian dot product. We validate our method on the public 3D Tetris dataset consisting of coordinates of geometric shapes and we show that our method has the capability of generalization over geometric transformations. We demonstrate that our model is superior to the vanilla multilayer perceptron (MLP) while having fewer parameters and no activation function in the hidden layers other than the embedding. In the presence of noise in the data, our model is also superior to the multilayer hypersphere perceptron (MLHP) proposed in prior work. In contrast to the latter, our method reflects the 3D-geometry and provides a topological interpretation of the learned coefficients in the geometric neurons.