Goto

Collaborating Authors

 fse


How Does Overparameterization Affect Features?

arXiv.org Artificial Intelligence

Overparameterization, the condition where models have more parameters than necessary to fit their training loss, is a crucial factor for the success of deep learning. However, the characteristics of the features learned by overparameterized networks are not well understood. In this work, we explore this question by comparing models with the same architecture but different widths. We first examine the expressivity of the features of these models, and show that the feature space of overparameterized networks cannot be spanned by concatenating many underparameterized features, and vice versa. This reveals that both overparameterized and underparameterized networks acquire some distinctive features. We then evaluate the performance of these models, and find that overparameterized networks outperform underparameterized networks, even when many of the latter are concatenated. We corroborate these findings using a VGG-16 and ResNet18 on CIFAR-10 and a Transformer on the MNLI classification dataset. Finally, we propose a toy setting to explain how overparameterized networks can learn some important features that the underparamaterized networks cannot learn. Overparameterized neural networks, which have more parameters than necessary to fit the training data, have achieved remarkable success in various tasks, such as image classification (He et al., 2016; Krizhevsky et al., 2017), object detection (Girshick et al., 2014; Redmon et al., 2016) or text classification (Zhang et al., 2015; Johnson & Zhang, 2016). However, the theoretical understanding of why these networks outperform underparameterized ones, which have fewer parameters and less capacity, is still limited.


Kinetic Energy Plus Penalty Functions for Sparse Estimation

arXiv.org Machine Learning

In this paper we propose and study a family of sparsity-inducing penalty functions. Since the penalty functions are related to the kinetic energy in special relativity, we call them \emph{kinetic energy plus} (KEP) functions. We construct the KEP function by using the concave conjugate of a $\chi^2$-distance function and present several novel insights into the KEP function with $q=1$. In particular, we derive a thresholding operator based on the KEP function, and prove its mathematical properties and asymptotic properties in sparsity modeling. Moreover, we show that a coordinate descent algorithm is especially appropriate for the KEP function. Additionally, we discuss the relationship of KEP with the penalty functions $\ell_{1/2}$ and MCP. The theoretical and empirical analysis validates that the KEP function is effective and efficient in high-dimensional data modeling.


Mechanism Design for Federated Sponsored Search Auctions

AAAI Conferences

Recently there is an increase in smaller, domain-specific search engines that scour the deep web finding information that general-purpose engines are unable to discover. These search engines play a crucial role in the new generation of search paradigms where federated search engines (FSEs) integrate search results from heterogeneous sources. In this paper we pose, for the first time, the problem to design a revenue mechanism that ensures profits both to individual search engines and FSEs as a mechanism design problem. To this end, we extend the sponsored search auction models and we discuss possibility and impossibility results on the implementation of an incentive compatible mechanism. Specifically, we develop an execution-contingent VCG (where payments depend on the observed click behavior) that satisfies both individual rationality and weak budget balance in expectation.