Goto

Collaborating Authors

 best value



Can Linear Probes Measure LLM Uncertainty?

arXiv.org Artificial Intelligence

Effective Uncertainty Quantification (UQ) represents a key aspect for reliable deployment of Large Language Models (LLMs) in automated decision-making and beyond. Yet, for LLM generation with multiple choice structure, the state-of-the-art in UQ is still dominated by the naive baseline given by the maximum softmax score. To address this shortcoming, we demonstrate that taking a principled approach via Bayesian statistics leads to improved performance despite leveraging the simplest possible model, namely linear regression. More precisely, we propose to train multiple Bayesian linear models, each predicting the output of a layer given the output of the previous one. Based on the obtained layer-level posterior distributions, we infer the global uncertainty level of the LLM by identifying a sparse combination of distributional features, leading to an efficient UQ scheme. Numerical experiments on various LLMs show consistent improvement over state-of-the-art baselines.


Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation

arXiv.org Artificial Intelligence

Deformable image registration estimates voxel-wise correspondences between images through spatial transformations, and plays a key role in medical imaging. While deep learning methods have significantly reduced runtime, efficiently handling large deformations remains a challenging task. Convolutional networks aggregate local features but lack direct modeling of voxel correspondences, promoting recent works to explore explicit feature matching. Among them, voxel-to-region matching is more efficient for direct correspondence modeling by computing local correlation features whithin neighbourhoods, while region-to-region matching incurs higher redundancy due to excessive correlation pairs across large regions. However, the inherent locality of voxel-to-region matching hinders the capture of long-range correspondences required for large deformations. To address this, we propose a Recurrent Correlation-based framework that dynamically relocates the matching region toward more promising positions. At each step, local matching is performed with low cost, and the estimated offset guides the next search region, supporting efficient convergence toward large deformations. In addition, we uses a lightweight recurrent update module with memory capacity and decouples motion-related and texture features to suppress semantic redundancy. We conduct extensive experiments on brain MRI and abdominal CT datasets under two settings: with and without affine pre-registration. Results show that our method exibits a strong accuracy-computation trade-off, surpassing or matching the state-of-the-art performance. For example, it achieves comparable performance on the non-affine OASIS dataset, while using only 9.5% of the FLOPs and running 96% faster than RDP, a representative high-performing method.


Time-Correlated Video Bridge Matching

arXiv.org Artificial Intelligence

Diffusion models excel in noise-to-data generation tasks, providing a mapping from a Gaussian distribution to a more complex data distribution. However they struggle to model translations between complex distributions, limiting their effectiveness in data-to-data tasks. While Bridge Matching (BM) models address this by finding the translation between data distributions, their application to time-correlated data sequences remains unexplored. This is a critical limitation for video generation and manipulation tasks, where maintaining temporal coherence is particularly important. To address this gap, we propose Time-Correlated Video Bridge Matching (TCVBM), a framework that extends BM to time-correlated data sequences in the video domain. TCVBM explicitly models inter-sequence dependencies within the diffusion bridge, directly incorporating temporal correlations into the sampling process. We compare our approach to classical methods based on bridge matching and diffusion models for three video-related tasks: frame interpolation, image-to-video generation, and video super-resolution. TCVBM achieves superior performance across multiple quantitative metrics, demonstrating enhanced generation quality and reconstruction fidelity.


Supplemental Materials Re Examining Linear for High Dimensional Bayesian Optimization

Neural Information Processing Systems

As explained in Sec. 4, with Within the first embedding, the optimal value of 0.398 can be reached. As described in Sec. 5, we show the importance of the Mahalanobis kernel using models fit to Fig. S2 compares model predictions for each of these models with the actual test-set outcomes; results Fig. S3 evaluates the predictive log marginal probabilities for the ARD RBF kernel and the Ma-halanobis kernel with posterior sampling across a wide range of training sets with different sizes Mahalanobis kernel is able to learn as the training set is expanded. This can be seen in the optimization results (Figs. 5 and S7) where ALEBO The implied kernel on the embedding is thus stationary. The argument follows that of Prop. 1. Linear embedding HDBO requires selecting a dimensionality for the embedding. The nature of the dimensionality vs. iteration budget trade-off is important in all These same considerations apply to multi-objective optimization.


Explainable assessment of financial experts' credibility by classifying social media forecasts and checking the predictions with actual market data

arXiv.org Artificial Intelligence

Social media include diverse interaction metrics related to user popularity, the most evident example being the number of user followers. The latter has raised concerns about the credibility of the posts by the most popular creators. However, most existing approaches to assess credibility in social media strictly consider this problem a binary classification, often based on a priori information, without checking if actual real-world facts back the users' comments. In addition, they do not provide automatic explanations of their predictions to foster their trustworthiness. In this work, we propose a credibility assessment solution for financial creators in social media that combines Natural Language Processing and Machine Learning. The reputation of the contributors is assessed by automatically classifying their forecasts on asset values by type and verifying these predictions with actual market data to approximate their probability of success. The outcome of this verification is a continuous credibility score instead of a binary result, an entirely novel contribution by this work. Moreover, social media metrics (i.e., user context) are exploited by calculating their correlation with the credibility rankings, providing insights on the interest of the end-users in financial posts and their forecasts (i.e., drop or rise). Finally, the system provides natural language explanations of its decisions based on a model-agnostic analysis of relevant features.


Heteroscedastic Preferential Bayesian Optimization with Informative Noise Distributions

arXiv.org Machine Learning

Preferential Bayesian optimization (PBO) is a sample-efficient framework for learning human preferences between candidate designs. PBO classically relies on homoscedastic noise models to represent human aleatoric uncertainty. Yet, such noise fails to accurately capture the varying levels of human aleatoric uncertainty, particularly when the user possesses partial knowledge among different pairs of candidates. For instance, a chemist with solid expertise in glucose-related molecules may easily compare two compounds from that family while struggling to compare alcohol-related molecules. Currently, PBO overlooks this uncertainty during the search for a new candidate through the maximization of the acquisition function, consequently underestimating the risk associated with human uncertainty. To address this issue, we propose a heteroscedastic noise model to capture human aleatoric uncertainty. This model adaptively assigns noise levels based on the distance of a specific input to a predefined set of reliable inputs known as anchors provided by the human. Anchors encapsulate partial knowledge and offer insight into the comparative difficulty of evaluating different candidate pairs. Such a model can be seamlessly integrated into the acquisition function, thus leading to candidate design pairs that elegantly trade informativeness and ease of comparison for the human expert. We perform an extensive empirical evaluation of the proposed approach, demonstrating a consistent improvement over homoscedastic PBO.


Exposing and Explaining Fake News On-the-Fly

arXiv.org Artificial Intelligence

The negative consequence of this openness of social media platforms is the spread of false information disguised as truth, i.e., fake news. Fake news can be defined as deceptive posts with an intention to mislead consumers in their purchase or approaching the context of misinformation and disinformation (Xiao et al, 2020). Specifically, while misinformation is an inadvertent action, disinformation is a deliberate creation/sharing of false information. The authenticity and intention can be distinguished as: (i) non-factual and mislead, i.e., deceptive news and disinformation; (ii) factual and mislead (cherry-picking); (iii) undefined and mislead (click-bait); and (iv) non-factual and undefined, i.e., misinformation. Misinformation and fake news are characterized by their big volume, uncertainty, and short-lived nature. Furthermore, they disseminate faster and further on social media sites causing serious impact on politics and economics (Tandoc, 2019). Accordingly, the report on digital transformation of media and the rise of disinformation/fake news of the European Union (EU) (Martens et al, 2018) reinforces the need to strengthen trust in digital media.


Self-Supervised Single-Image Deconvolution with Siamese Neural Networks

arXiv.org Artificial Intelligence

Inverse problems in image reconstruction are fundamentally complicated by unknown noise properties. Classical iterative deconvolution approaches amplify noise and require careful parameter selection for an optimal trade-off between sharpness and grain. Deep learning methods allow for flexible parametrization of the noise and learning its properties directly from the data. Recently, self-supervised blind-spot neural networks were successfully adopted for image deconvolution by including a known point-spread function in the end-to-end training. However, their practical application has been limited to 2D images in the biomedical domain because it implies large kernels that are poorly optimized. We tackle this problem with Fast Fourier Transform convolutions that provide training speed-up in 3D microscopy deconvolution tasks. Further, we propose to adopt a Siamese invariance loss for deconvolution and empirically identify its optimal position in the neural network between blind-spot and full image branches. The experimental results show that our improved framework outperforms the previous state-of-the-art deconvolution methods with a known point spread function.


Cost-aware learning of relevant contextual variables within Bayesian optimization

arXiv.org Artificial Intelligence

Contextual Bayesian Optimization (CBO) is a powerful framework for optimizing black-box, expensive-to-evaluate functions with respect to design variables, while simultaneously efficiently integrating relevant contextual information regarding the environment, such as experimental conditions. However, in many practical scenarios, the relevance of contextual variables is not necessarily known beforehand. Moreover, the contextual variables can sometimes be optimized themselves, a setting that current CBO algorithms do not take into account. Optimizing contextual variables may be costly, which raises the question of determining a minimal relevant subset. In this paper, we frame this problem as a cost-aware model selection BO task and address it using a novel method, Sensitivity-Analysis-Driven Contextual BO (SADCBO). We learn the relevance of context variables by sensitivity analysis of the posterior surrogate model at specific input points, whilst minimizing the cost of optimization by leveraging recent developments on early stopping for BO. We empirically evaluate our proposed SADCBO against alternatives on synthetic experiments together with extensive ablation studies, and demonstrate a consistent improvement across examples.