Goto

Collaborating Authors

 affine function




A Lemmas

Neural Information Processing Systems

Algorithm 5 finds such a network in poly (n, l) time. The sequence r (k) defined by (14) is a strictly increasing sequence. Appendix B.8. Lemma 6. F or any positive integer k, the sequence r (k) defined by (14) satisfies r ( k) 3 null 2 The number of closed connected subsets satisfying Definition 1 is a minimum. The interior and frontier (boundary) of a set X are denoted as Int X and Fr X, respectively. Proposition 1. F or any family of closed connected subsets satisfying Definition 1, all subsets are the Let S( k) be the collection of all permutations of the set [ k ].


Linear Regression in p-adic metric spaces

Baker, Gregory D., McCallum, Scott, Pattinson, Dirk

arXiv.org Artificial Intelligence

Many real-world machine learning problems involve inherently hierarchical data, yet traditional approaches rely on Euclidean metrics that fail to capture the discrete, branching nature of hierarchical relationships. We present a theoretical foundation for machine learning in p-adic metric spaces, which naturally respect hierarchical structure. Our main result proves that an n-dimensional plane minimizing the p-adic sum of distances to points in a dataset must pass through at least n + 1 of those points -- a striking contrast to Euclidean regression that highlights how p-adic metrics better align with the discrete nature of hierarchical data. As a corollary, a polynomial of degree n constructed to minimise the p-adic sum of residuals will pass through at least n + 1 points. As a further corollary, a polynomial of degree n approximating a higher degree polynomial at a finite number of points will yield a difference polynomial that has distinct rational roots. We demonstrate the practical significance of this result through two applications in natural language processing: analyzing hierarchical taxonomies and modeling grammatical morphology. These results suggest that p-adic metrics may be fundamental to properly handling hierarchical data structures in machine learning. In hierarchical data, interpolation between points often makes less sense than selecting actual observed points as representatives.



Geometric Integration for Neural Control Variates

Meister, Daniel, Harada, Takahiro

arXiv.org Machine Learning

Thanks to our geometric subdivision, we can integrate the neural network analytically, and use it as a control variate for Monte Carlo integration. The integral of the approximation provides a biased estimate (left), which is corrected by Monte Carlo integration of the residual integrand (center left), obtaining the final unbiased estimate (center right), which can achieve a lower error than vanilla Monte Carlo (right). Abstract Control variates are a variance-reduction technique for Monte Carlo integration. The principle involves approximating the integrand by a function that can be analytically integrated, and integrating using the Monte Carlo method only the residual difference between the integrand and the approximation, to obtain an unbiased estimate. Neural networks are universal approx-imators that could potentially be used as a control variate. However, the challenge lies in the analytic integration, which is not possible in general. In this manuscript, we study one of the simplest neural network models, the multilayered perceptron (MLP) with continuous piecewise linear activation functions, and its possible analytic integration. W e propose an integration method based on integration domain subdivision, employing techniques from computational geometry to solve this problem in 2D. W e demonstrate that an MLP can be used as a control variate in combination with our integration method, showing applications in the light transport simulation. 1. Introduction To synthesize photorealistic images, we need to solve notoriously complex integrals that model the underlying light transport. In general, these integrals do not have an analytic solution, and thus we employ tools of numerical integration to solve them. Among those, Monte Carlo integration is prominent, providing a general and robust solution, efficiently dealing with, for example, high dimensions or discontinuities that other numerical methods may struggle with. Monte Carlo converges to a correct solution with an increasing number of samples; however, it may require a large number of samples to suppress variance, that otherwise exhibits as high-frequency noise in the rendered images, under an acceptable threshold.


Identifiable Convex-Concave Regression via Sub-gradient Regularised Least Squares

Chung, William

arXiv.org Machine Learning

We propose a novel nonparametric regression method that models complex input-output relationships as the sum of convex and concave components. The method-Identifiable Convex-Concave Nonparametric Least Squares (ICCNLS)-decomposes the target function into additive shape-constrained components, each represented via sub-gradient-constrained affine functions. To address the affine ambiguity inherent in convex-concave decompositions, we introduce global statistical orthogonality constraints, ensuring that residuals are uncorrelated with both intercept and input variables. This enforces decomposition identifiability and improves interpretability. We further incorporate L1, L2 and elastic net regularisation on sub-gradients to enhance generalisation and promote structural sparsity. The proposed method is evaluated on synthetic and real-world datasets, including healthcare pricing data, and demonstrates improved predictive accuracy and model simplicity compared to conventional CNLS and difference-of-convex (DC) regression approaches. Our results show that statistical identifiability, when paired with convex-concave structure and sub-gradient regularisation, yields interpretable models suited for forecasting, benchmarking, and policy evaluation.


Model Informed Flows for Bayesian Inference of Probabilistic Programs

Ko, Joohwan, Domke, Justin

arXiv.org Machine Learning

Variational inference often struggles with the posterior geometry exhibited by complex hierarchical Bayesian models. Recent advances in flow-based variational families and Variationally Inferred Parameters (VIP) each address aspects of this challenge, but their formal relationship is unexplored. Here, we prove that the combination of VIP and a full-rank Gaussian can be represented exactly as a forward autoregressive flow augmented with a translation term and input from the model's prior. Guided by this theoretical insight, we introduce the Model-Informed Flow (MIF) architecture, which adds the necessary translation mechanism, prior information, and hierarchical ordering. Empirically, MIF delivers tighter posterior approximations and matches or exceeds state-of-the-art performance across a suite of hierarchical and non-hierarchical benchmarks.


ReLU Networks as Random Functions: Their Distribution in Probability Space

Chaudhari, Shreyas, Moura, José M. F.

arXiv.org Artificial Intelligence

This paper presents a novel framework for understanding trained ReLU networks as random, affine functions, where the randomness is induced by the distribution over the inputs. By characterizing the probability distribution of the network's activation patterns, we derive the discrete probability distribution over the affine functions realizable by the network. We extend this analysis to describe the probability distribution of the network's outputs. Our approach provides explicit, numerically tractable expressions for these distributions in terms of Gaussian orthant probabilities. Additionally, we develop approximation techniques to identify the support of affine functions a trained ReLU network can realize for a given distribution of inputs. Our work provides a framework for understanding the behavior and performance of ReLU networks corresponding to stochastic inputs, paving the way for more interpretable and reliable models.


Linear-Size Neural Network Representation of Piecewise Affine Functions in $\mathbb{R}^2$

Zanotti, Leo

arXiv.org Machine Learning

It is shown that any continuous piecewise affine (CPA) function $\mathbb{R}^2\to\mathbb{R}$ with $p$ pieces can be represented by a ReLU neural network with two hidden layers and $O(p)$ neurons. Unlike prior work, which focused on convex pieces, this analysis considers CPA functions with connected but potentially non-convex pieces.