parity function
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Provable Guarantees for Neural Networks via Gradient Feature Learning
Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
This paper presents an efficient algorithm for learning sparse polynomial functions over boolean domain. The polynomial functions are represented by a linear combination of parity functions so that the learning is essentially finding all the non-zero coefficients. Given a set of observations, the sparse polynomial learning is formulated into a compressive sensing problem over a large scale linear system. Different from [12][13], this paper presents theoretical results on finding the active set (the indices of non-zero coefficients) efficiently by using the unique sign pattern property, and also an application to graph sketching in social network data.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
Dimension lower bounds for linear approaches to function approximation
This short note presents a linear algebraic approach to proving dimension lower bounds for linear methods that solve $L^2$ function approximation problems. The basic argument has appeared in the literature before (e.g., Barron, 1993) for establishing lower bounds on Kolmogorov $n$-widths. The argument is applied to give sample size lower bounds for kernel methods.
Regularizing Towards Permutation Invariance in Recurrent Models
In many machine learning problems the output should not depend on the order of the input. Such "permutation invariant" functions have been studied extensively recently. Here we argue that temporal architectures such as RNNs are highly relevant for such problems, despite the inherent dependence of RNNs on order. We show that RNNs can be regularized towards permutation invariance, and that this can result in compact models, as compared to non-recurrent architectures. We implement this idea via a novel form of stochastic regularization. Existing solutions mostly suggest restricting the learning problem to hypothesis classes which are permutation invariant by design [Zaheer et al., 2017, Lee et al., 2019, Murphy et al., 2018]. Our approach of enforcing permutation invariance via regularization gives rise to models which are semi permutation invariant (e.g.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada (0.04)
New Statistical and Computational Results for Learning Junta Distributions
We study the problem of learning junta distributions on $\{0, 1\}^n$, where a distribution is a $k$-junta if its probability mass function depends on a subset of at most $k$ variables. We make two main contributions: - We show that learning $k$-junta distributions is \emph{computationally} equivalent to learning $k$-parity functions with noise (LPN), a landmark problem in computational learning theory. - We design an algorithm for learning junta distributions whose statistical complexity is optimal, up to polylogarithmic factors. Computationally, our algorithm matches the complexity of previous (non-sample-optimal) algorithms. Combined, our two contributions imply that our algorithm cannot be significantly improved, statistically or computationally, barring a breakthrough for LPN.
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Asia > China > Hong Kong (0.04)
What exactly has TabPFN learned to do?
TabPFN [Hollmann et al., 2023], a Transformer model pretrained to perform in-context learning on fresh tabular classification problems, was presented at the last ICLR conference. To better understand its behavior, we treat it as a black-box function approximator generator and observe its generated function approximations on a varied selection of training datasets. Exploring its learned inductive biases in this manner, we observe behavior that is at turns either brilliant or baffling. We conclude this post with thoughts on how these results might inform the development, evaluation, and application of prior-data fitted networks (PFNs) in the future.
Task Generalization With AutoRegressive Compositional Structure: Can Learning From $\d$ Tasks Generalize to $\d^{T}$ Tasks?
Abedsoltan, Amirhesam, Zhang, Huaqing, Wen, Kaiyue, Lin, Hongzhou, Zhang, Jingzhao, Belkin, Mikhail
Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of AutoRegressive Compositional (ARC) structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $\d$ subtasks. This yields a total class of size~\( \d^\TT \). We first show that generalization to all \( \d^\TT \) tasks is theoretically achievable by training on only \( \tilde{O}(\d) \) tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via in-context learning (ICL) and Chain-of-Thought (CoT) reasoning. We further demonstrate this generalization in arithmetic and language translation, extending beyond parity functions.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)