Chen, Yanzhi
On Evaluating LLMs' Capabilities as Functional Approximators: A Bayesian Perspective
Siddiqui, Shoaib Ahmed, Chen, Yanzhi, Heo, Juyeon, Xia, Menglin, Weller, Adrian
Recent works have successfully applied Large Language Models (LLMs) to function modeling tasks. However, the reasons behind this success remain unclear. In this work, we propose a new evaluation framework to comprehensively assess LLMs' function modeling abilities. By adopting a Bayesian perspective of function modeling, we discover that LLMs are relatively weak in understanding patterns in raw data, but excel at utilizing prior knowledge about the domain to develop a strong understanding of the underlying function. Our findings offer new insights about the strengths and limitations of LLMs in the context of function modeling.
Scalable Infomin Learning
Chen, Yanzhi, Sun, Weihao, Li, Yingzhen, Weller, Adrian
The task of infomin learning aims to learn a representation with high utility while being uninformative about a specified target, with the latter achieved by minimising the mutual information between the representation and the target. It has broad applications, ranging from training fair prediction models against protected attributes, to unsupervised learning with disentangled representations. Recent works on infomin learning mainly use adversarial training, which involves training a neural network to estimate mutual information or its proxy and thus is slow and difficult to optimise. Drawing on recent advances in slicing techniques, we propose a new infomin learning approach, which uses a novel proxy metric to mutual information. We further derive an accurate and analytically computable approximation to this proxy metric, thereby removing the need of constructing neural network-based mutual information estimators. Experiments on algorithmic fairness, disentangled representation learning and domain adaptation verify that our method can effectively remove unwanted information with limited time budget.
Do Concept Bottleneck Models Learn as Intended?
Margeloiu, Andrei, Ashman, Matthew, Bhatt, Umang, Chen, Yanzhi, Jamnik, Mateja, Weller, Adrian
Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form. Koh et al. (2020) proposed concept bottleneck models (CBMs) as a way to incorporate pre-defined expert concepts (e.g., "bone spurs present" or "wing color") into a supervised learning procedure.
Neural Approximate Sufficient Statistics for Implicit Models
Chen, Yanzhi, Zhang, Dinghuai, Gutmann, Michael, Courville, Aaron, Zhu, Zhanxing
We consider the fundamental problem of how to automatically construct summary statistics for implicit generative models where the evaluation of likelihood function is intractable but sampling / simulating data from the model is possible. The idea is to frame the task of constructing sufficient statistics as learning mutual information maximizing representation of the data. This representation is computed by a deep neural network trained by a joint statistic-posterior learning strategy. We apply our approach to both traditional approximate Bayesian computation (ABC) and recent neural likelihood approaches, boosting their performance on a range of tasks.