Goto

Collaborating Authors

 boolean function


Bounds on the computational complexity of neurons due to dendritic morphology

Neural Information Processing Systems

The simple linear threshold units used in many artificial neural networks have a limited computational capacity. Famously, a single unit cannot handle nonlinearly separable problems like XOR. In contrast, real neurons exhibit complex morphologies as well as active dendritic integration, suggesting that their computational capacities outperform those of simple linear units. Considering specific families of Boolean functions, we empirically examine the computational limits of single units that incorporate more complex dendritic structures. For random Boolean functions, we show that there is a phase transition in learnability as a function of the input dimension, with most random functions below a certain critical dimension being learnable and those above not.


Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

arXiv.org Machine Learning

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.