Goto

Collaborating Authors

 Learning Graphical Models


Curiosity-Driven LLM-as-a-judge for Personalized Creative Judgment

arXiv.org Artificial Intelligence

Creative Thinking(TTCW) benchmark introduced in Chakrabarty et al. (2024), Rigorous, standardized evaluation has repeatedly catalyzed progress in machine learning, from ImageNetRussakovsky et al. (2015) and GLUEWang et al. (2019), driving leaps in the fields of computer vision and Natural Language Processing, respectively. The same effect is evident in objective math reasoning, where benchmarks like GSM8KCobbe et al. (2021), together with RL-trained reasoning models such as OpenAI's o1OpenAI et al. (2024) and DeepSeek-R1DeepSeek-AI Models(LLM) as a judge prefer their own generations making them unreliable. As shown in Chakrabarty et al. (2024) and Table 12 and Table 2, even Specifically, when the model is "surprised" by an expert's explanation, it signals a mismatch between the LLM's prior belief and the expert's The intuition behind predicting the annotator is that the model can learn which annotator caused the belief shift, allowing it to calibrate the curiosity signal for each annotator individually, thereby improving personalization. In our experiments, we establish a baseline using an SFT model that predicts annotators' binary More details about the results can be found in Fig 4.Figure 1: Overview of Architecture during training for Curiosity Driven LLM-as-a-judgeFigure 2: Overview of Architecture during inference for Curiosity Driven LLM-as-a-judge 2 (a) Baseline without using explanations (b) Baseline using explanations TTCW dataset Chakrabarty et al. (2024), which is based on the Torrance Test of Creative Thinking Torrance (1966) but adapted for LLMs. All the distinct dimensions in the TTCW dataset are mentioned in Appendix A.1.


BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks

arXiv.org Artificial Intelligence

LLM web agents now browse and take actions on the open web, yet current agent evaluations are constrained to sandboxed environments or artificial tasks. We introduce BrowserArena, a live open-web agent evaluation platform that collects user-submitted tasks, runs Arena-style head-to-head comparisons, and uses step-level human feedback to surface failure modes. Collecting and analyzing step-level annotations on the agent traces, we identify three consistent failure modes: captcha resolution, pop-up banner removal, and direct navigation to URLs. By constructing targeted datasets to further study these tasks, we discover variations in how different language models navigate these failure modes. We find, for example, that o4-mini deploys a wider variety of strategies to circumvent captcha resolution than other models and DeepSeek-R1 consistently misleads users about pop-up banner closure. Our findings surface both the diversity and brittleness of current web agents. More broadly, our benchmarking methodology provides an approach to evaluating and understanding web agent failure modes at scale.


Risk Profiling and Modulation for LLMs

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.


Understanding Catastrophic Interference: On the Identifibility of Latent Representations

arXiv.org Artificial Intelligence

Catastrophic interference, also known as catastrophic forgetting, is a fundamental challenge in machine learning, where a trained learning model progressively loses performance on previously learned tasks when adapting to new ones. In this paper, we aim to better understand and model the catastrophic interference problem from a latent representation learning point of view, and propose a novel theoretical framework that formulates catastrophic interference as an identification problem. Our analysis demonstrates that the forgetting phenomenon can be quantified by the distance between partial-task aware (PTA) and all-task aware (ATA) setups. Building upon recent advances in identifiability theory, we prove that this distance can be minimized through identification of shared latent variables between these setups. When learning, we propose our method \ourmeos with two-stage training strategy: First, we employ maximum likelihood estimation to learn the latent representations from both PTA and ATA configurations. Subsequently, we optimize the KL divergence to identify and learn the shared latent variables. Through theoretical guarantee and empirical validations, we establish that identifying and learning these shared representations can effectively mitigate catastrophic interference in machine learning systems. Our approach provides both theoretical guarantees and practical performance improvements across both synthetic and benchmark datasets.


Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers

arXiv.org Artificial Intelligence

Deep neural networks perform remarkably well on image classification tasks but remain vulnerable to carefully crafted adversarial perturbations. This work revisits linear dimensionality reduction as a simple, data-adapted defense. We empirically compare standard Principal Component Analysis (PCA) with its sparse variant (SPCA) as front-end feature extractors for downstream classifiers, and we complement these experiments with a theoretical analysis. On the theory side, we derive exact robustness certificates for linear heads applied to SPCA features: for both $\ell_\infty$ and $\ell_2$ threat models (binary and multiclass), the certified radius grows as the dual norms of $W^\top u$ shrink, where $W$ is the projection and $u$ the head weights. We further show that for general (non-linear) heads, sparsity reduces operator-norm bounds through a Lipschitz composition argument, predicting lower input sensitivity. Empirically, with a small non-linear network after the projection, SPCA consistently degrades more gracefully than PCA under strong white-box and black-box attacks while maintaining competitive clean accuracy. Taken together, the theory identifies the mechanism (sparser projections reduce adversarial leverage) and the experiments verify that this benefit persists beyond the linear setting. Our code is available at https://github.com/killian31/SPCARobustness.


Attribute Fusion-based Classifier on Framework of Belief Structure

arXiv.org Artificial Intelligence

Abstract--Dempster-Shafer Theory (DST) provides a powerful framework for modeling uncertainty and has been widely applied to multi-attribute classification tasks. However, traditional DST - based attribute fusion-based classifiers suffer from oversimplified membership function modeling and limited exploitation of the belief structure brought by basic probability assignment (BPA), reducing their effectiveness in complex real-world scenarios. This paper presents an enhanced attribute fusion-based classifier that addresses these limitations through two key innovations. First, we adopt a selective modeling strategy that utilizes both single Gaussian and Gaussian Mixture Models (GMMs) for membership function construction, with model selection guided by cross-validation and a tailored evaluation metric. Second, we introduce a novel method to transform the possibility distribution into a BPA by combining simple BPAs derived from normalized possibility distributions, enabling a much richer and more flexible representation of uncertain information. Furthermore, we apply the belief structure-based BPA generation method to the evidential K-Nearest Neighbors (EKNN) classifier, enhancing its ability to incorporate uncertainty information into decision-making. Comprehensive experiments on benchmark datasets are conducted to evaluate the performance of the proposed attribute fusion-based classifier and the enhanced evidential K-Nearest Neighbors classifier in comparison with both evidential classifiers and conventional machine learning classifiers. The results demonstrate that the proposed classifier outperforms the best existing evidential classifier, achieving an average accuracy improvement of 4.86%, while maintaining low variance, thus confirming its superior effectiveness and robustness.


Relative Information Gain and Gaussian Process Regression

arXiv.org Machine Learning

The sample complexity of estimating or maximising an unknown function in a reproducing kernel Hilbert space is known to be linked to both the effective dimension and the information gain associated with the kernel. While the information gain has an attractive information-theoretic interpretation, the effective dimension typically results in better rates. We introduce a new quantity called the relative information gain, which measures the sensitivity of the information gain with respect to the observation noise. We show that the relative information gain smoothly interpolates between the effective dimension and the information gain, and that the relative information gain has the same growth rate as the effective dimension. In the second half of the paper, we prove a new P AC-Bayesian excess risk bound for Gaussian process regression. The relative information gain arises naturally from the complexity term in this P AC-Bayesian bound. We prove bounds on the relative information gain that depend on the spectral properties of the kernel. When these upper bounds are combined with our excess risk bound, we obtain minimax-optimal rates of convergence.


Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning

arXiv.org Machine Learning

Learning about the directed acyclic graph (DAG) underlying a system's data-generating process from observational data under causal sufficiency is a fundamental causal discovery task (Pearl, 2009). Score-based algorithms address this task by assigning penalized likelihood scores to each DAG and seeking graphs whose scores are optimal. Identifiability theory asks when such score-optimal graphs identify the target graph (or its equivalence class) in the infinite-sample limit, with various results under different assumptions and scores (Chickering, 2002; Nandy et al., 2018). Exact algorithms, that are guaranteed to find a score-optimal graph, have exponential run-time and are feasible up to roughly 30 variables (Koivisto & Sood, 2004; Silander & Myllym aki, 2006). For larger graphs, local search must be employed, which evaluates neighbouring graphs to find graphs with better scores; canonical moves for this hill climbing are single edge insertions, deletions, or reversals (Heckerman et al., 1995). In the sample limit, greedy discrete search with a neighbourhood notion that respects score equivalence provably finds a graph with optimal score (Chickering, 2002). In finite samples, scores are inexact and local search may get stuck in local optima or, as we demonstrate, even find graphs with better scores than the true graph. Finite-sample performance is a practical challenge, despite the mature identifiability theory and asymptotic guarantees. Continuous optimization methods have emerged as a popular alternative.


SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator

arXiv.org Machine Learning

Deep generative models have made significant advances in generating complex content, yet conditional generation remains a fundamental challenge. Existing conditional generative adversarial networks often struggle to balance the dual objectives of assessing authenticity and conditional alignment of input samples within their conditional discriminators. To address this, we propose a novel discriminator design that integrates three key capabilities: unconditional discrimination, matching-aware supervision to enhance alignment sensitivity, and adaptive weighting to dynamically balance all objectives. Specifically, we introduce Sum of Naturalness and Alignment (SONA), which employs separate projections for naturalness (authenticity) and alignment in the final layer with an inductive bias, supported by dedicated objective functions and an adaptive weighting mechanism. Extensive experiments on class-conditional generation tasks show that \ours achieves superior sample quality and conditional alignment compared to state-of-the-art methods. Furthermore, we demonstrate its effectiveness in text-to-image generation, confirming the versatility and robustness of our approach.


Modular and Adaptive Conformal Prediction for Sequential Models via Residual Decomposition

arXiv.org Machine Learning

Conformal prediction offers finite-sample coverage guarantees under minimal assumptions. However, existing methods treat the entire modeling process as a black box, overlooking opportunities to exploit modular structure. We introduce a conformal prediction framework for two-stage sequential models, where an upstream predictor generates intermediate representations for a downstream model. By decomposing the overall prediction residual into stage-specific components, our method enables practitioners to attribute uncertainty to specific pipeline stages. We develop a risk-controlled parameter selection procedure using family-wise error rate (FWER) control to calibrate stage-wise scaling parameters, and propose an adaptive extension for non-stationary settings that preserves long-run coverage guarantees. Experiments on synthetic distribution shifts, as well as real-world supply chain and stock market data, demonstrate that our approach maintains coverage under conditions that degrade standard conformal methods, while providing interpretable stage-wise uncertainty attribution. This framework offers diagnostic advantages and robust coverage that standard conformal methods lack.