Goto

Collaborating Authors

 blessing


Blessing of Depth in Linear Regression: Deeper Models Have Flatter Landscape Around the True Solution

Neural Information Processing Systems

This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise, and the true linear model is captured via an $N$-layer diagonal linear neural network. On the negative side, we show that this problem does not have a benign landscape: given any $N\geq 1$, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes oblivious to such "problematic" solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with $\ell_1$-loss.


Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness

Neural Information Processing Systems

Understanding the consequences of mutation for molecular fitness and function is a fundamental problem in biology. Recently, generative probabilistic models have emerged as a powerful tool for estimating fitness from evolutionary sequence data, with accuracy sufficient to predict both laboratory measurements of function and disease risk in humans, and to design novel functional proteins. Existing techniques rest on an assumed relationship between density estimation and fitness estimation, a relationship that we interrogate in this article. We prove that fitness is not identifiable from observational sequence data alone, placing fundamental limits on our ability to disentangle fitness landscapes from phylogenetic history. We show on real datasets that perfect density estimation in the limit of infinite data would, with high confidence, result in poor fitness estimation; current models perform accurate fitness estimation because of, not despite, misspecification. Our results challenge the conventional wisdom that bigger models trained on bigger datasets will inevitably lead to better fitness estimation, and suggest novel estimation strategies going forward.


Reviews: Fast structure learning with modular regularization

Neural Information Processing Systems

The manuscript proposes a new objective function for learning Gaussian latent factor models. The objective function is based on information-theoretic characterization of modular latent factor models, where the model attains optimal value. The derivation of the objective function carefully avoids matrix inversion to improve computational complexity compared to traditional methods. The authors pointed out that the proposed model enjoys'blessing of dimension' in that model performance improves when the dimension of observable variables increases while the dimension of latent variables remains constant. This is demonstrated by both simulation and an information-theoretic lower bound on the sample size.


Blessing of Depth in Linear Regression: Deeper Models Have Flatter Landscape Around the True Solution

Neural Information Processing Systems

This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise, and the true linear model is captured via an N -layer diagonal linear neural network. On the negative side, we show that this problem does not have a benign landscape: given any N\geq 1, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any N -layer model with N\geq 2, a simple sub-gradient method becomes oblivious to such "problematic" solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with \ell_1 -loss.


Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Neural Information Processing Systems

Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly large, which contradicts the limited on-device resources. This gap could be more severe in multilingual/multitask scenarios requiring simultaneously recognizing multiple languages or executing multiple speech processing tasks. Additionally, strongly overparameterized speech SSL models tend to suffer from overfitting when being finetuned on low-resource speech corpus. This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S 3 -Router framework, which for the first time discovers that simply discarding no more than 10% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks.


Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness

Neural Information Processing Systems

Understanding the consequences of mutation for molecular fitness and function is a fundamental problem in biology. Recently, generative probabilistic models have emerged as a powerful tool for estimating fitness from evolutionary sequence data, with accuracy sufficient to predict both laboratory measurements of function and disease risk in humans, and to design novel functional proteins. Existing techniques rest on an assumed relationship between density estimation and fitness estimation, a relationship that we interrogate in this article. We prove that fitness is not identifiable from observational sequence data alone, placing fundamental limits on our ability to disentangle fitness landscapes from phylogenetic history. We show on real datasets that perfect density estimation in the limit of infinite data would, with high confidence, result in poor fitness estimation; current models perform accurate fitness estimation because of, not despite, misspecification.


Reviews: Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Neural Information Processing Systems

On further reflection and seeing the responses, I have substantially increased my score. I think the takeaway "interpolating classifiers/regressors need not overfit" is quite important, even if the algorithms studied here are fairly different from the ones people are actually concerned about. I would suggest re-emphasizing that this is the main point of the paper in the introduction, and additionally toning down some of the discussion about a "blessing of dimensionality" as mentioned in your response / below. This is related to the recent "controversy" in learning theory, brought to prominence by [43] and continued in [9, 41], that practical deep learning models (and some kernel-based models) lie far outside the regime of performance explained by current learning theory, for example having extremely high Rademacher complexity, and yet perform well in practice. This paper gives bounds for two particular nearest-neighbor-like models, which interpolate the training data and yet are argued to generalize well in high dimensions.


GPTVQ: The Blessing of Dimensionality for LLM Quantization

van Baalen, Mart, Kuzmin, Andrey, Nagel, Markus, Couperus, Peter, Bastoul, Cedric, Mahurin, Eric, Blankevoort, Tijmen, Whatmough, Paul

arXiv.org Artificial Intelligence

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality. We propose the GPTVQ method, a new fast method for post-training vector quantization (VQ) that scales well to Large Language Models (LLMs). Our method interleaves quantization of one or more columns with updates to the remaining unquantized weights, using information from the Hessian of the per-layer output reconstruction MSE. Quantization codebooks are initialized using an efficient data-aware version of the EM algorithm. The codebooks are then updated, and further compressed by using integer quantization and SVD-based compression. GPTVQ establishes a new state-of-the art in the size vs accuracy trade-offs on a wide range of LLMs such as Llama-v2 and Mistral. Furthermore, our method is efficient: on a single H100 it takes between 3 and 11 hours to process a Llamav2-70B model, depending on quantization setting. Lastly, with on-device timings for VQ decompression on a mobile CPU we show that VQ leads to improved latency compared to using a 4-bit integer format.


Length is a Curse and a Blessing for Document-level Semantics

Xiao, Chenghao, Li, Yizhi, Hudson, G Thomas, Lin, Chenghua, Moubayed, Noura Al

arXiv.org Artificial Intelligence

In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.


Facts to Know how AI is helping small businesses.

#artificialintelligence

Artificial Intelligence (AI) has been transforming the way we live and work for years, and it is now taking over the business world. From improving customer experiences to optimizing operations, AI has become an essential tool for businesses of all sizes. In this blog post, we will explore the ways AI is taking over the business and how it is blessing the future. One of the primary ways AI is taking over a business is by improving customer experiences. AI-powered chatbots are now being used by companies to provide customers with instant support and assistance, eliminating the need for human customer service representatives.