$\partial\mathbb{B}$ nets: learning discrete functions by gradient descent