btn
On the Learning Curves of Revenue Maximization
Hanneke, Steve, Kalavasis, Alkis, Moran, Shay, Velegkas, Grigoris
Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitative measure of its generalization ability. Formally, a learning curve plots the decay of an algorithm's error for a fixed underlying distribution as a function of the number of training samples. Prior work on revenue-maximizing learning algorithms, starting with the seminal work of Cole and Roughgarden [STOC, 2014], adopts a distribution-free perspective, which parallels the PAC learning framework in learning theory. This approach evaluates performance against the hardest possible sequence of valuation distributions, one for each sample size, effectively defining the upper envelope of learning curves over all possible distributions, thus leading to error bounds that do not capture the shape of the learning curves. In this work we initiate the study of learning curves for revenue maximization and provide a near-complete characterization of their rate of decay in the basic setting of a single item and a single buyer. In the absence of any restriction on the valuation distribution, we show that there exists a Bayes-consistent algorithm, meaning that its learning curve converges to zero for any arbitrary valuation distribution as the number of samples $n \to \infty$. However, this convergence must be arbitrarily slow, even if the optimal revenue is finite. In contrast, if the optimal revenue is achieved by a finite price, then the optimal rate of decay is roughly $1/\sqrt{n}$. Finally, for distributions supported on discrete sets of values, we show that learning curves decay almost exponentially fast, a rate unattainable under the PAC framework.
Bound Tightening Network for Robust Crowd Counting
Crowd Counting is a fundamental topic, aiming to estimate the number of individuals in the crowded images or videos fed from surveillance cameras. Recent works focus on improving counting accuracy, while ignoring the certified robustness of counting models. In this paper, we propose a novel Bound Tightening Network (BTN) for Robust Crowd Counting. It consists of three parts: base model, smooth regularization module and certify bound module. The core idea is to propagate the interval bound through the base model (certify bound module) and utilize the layer weights (smooth regularization module) to guide the network learning. Experiments on different benchmark datasets for counting demonstrate the effectiveness and efficiency of BTN.
On the Compressive Power of Boolean Threshold Autoencoders
Melkman, Avraham A., Guo, Sini, Ching, Wai-Ki, Liu, Pengyu, Akutsu, Tatsuya
An autoencoder is a layered neural network whose structure can be viewed as consisting of an encoder, which compresses an input vector of dimension $D$ to a vector of low dimension $d$, and a decoder which transforms the low-dimensional vector back to the original input vector (or one that is very similar). In this paper we explore the compressive power of autoencoders that are Boolean threshold networks by studying the numbers of nodes and layers that are required to ensure that the numbers of nodes and layers that are required to ensure that each vector in a given set of distinct input binary vectors is transformed back to its original. We show that for any set of $n$ distinct vectors there exists a seven-layer autoencoder with the smallest possible middle layer, (i.e., its size is logarithmic in $n$), but that there is a set of $n$ vectors for which there is no three-layer autoencoder with a middle layer of the same size. In addition we present a kind of trade-off: if a considerably larger middle layer is permissible then a five-layer autoencoder does exist. We also study encoding by itself. The results we obtain suggest that it is the decoding that constitutes the bottleneck of autoencoding. For example, there always is a three-layer Boolean threshold encoder that compresses $n$ vectors into a dimension that is reduced to twice the logarithm of $n$.
Bayesian Tensor Network and Optimization Algorithm for Probabilistic Machine Learning
Describing or calculating the conditional probabilities of multiple events is exponentially expensive. In this work, a natural generalization of Bayesian belief network is proposed by incorporating with tensor network, which is dubbed as Bayesian tensor network (BTN), to efficiently describe the conditional probabilities among multiple sets of events. The complexity of BTN that gives the conditional probabilities of $M$ sets of events scales only polynomially with $M$. To testify its validity, BTN is implemented to capture the conditional probabilities between images and their classifications, where each feature is mapped to a probability distribution of a set of mutually exclusive events. A rotation optimization method is suggested to update BTN, which avoids gradient vanishing problem and exhibits high efficiency. With a simple tree network structures, BTN exhibits competitive performances on fashion-MNIST dataset. Analogous to the tensor network simulations of quantum systems, the validity of BTN implies an "area law" of fluctuations in image recognition problems.