Plotting

Agentic Large Language Models, a survey

arXiv.org Artificial Intelligence

There is great interest in agentic LLMs, large language models that act as agents. We review the growing body of work in this area and provide a research agenda. Agentic LLMs are LLMs that (1) reason, (2) act, and (3) interact. We organize the literature according to these three categories. The research in the first category focuses on reasoning, reflection, and retrieval, aiming to improve decision making; the second category focuses on action models, robots, and tools, aiming for agents that act as useful assistants; the third category focuses on multi-agent systems, aiming for collaborative task solving and simulating interaction to study emergent social behavior. We find that works mutually benefit from results in other categories: retrieval enables tool use, reflection improves multi-agent collaboration, and reasoning benefits all categories. We discuss applications of agentic LLMs and provide an agenda for further research. Important applications are in medical diagnosis, logistics and financial market analysis. Meanwhile, self-reflective agents playing roles and interacting with one another augment the process of scientific research itself. Further, agentic LLMs may provide a solution for the problem of LLMs running out of training data: inference-time behavior generates new training states, such that LLMs can keep learning without needing ever larger datasets. We note that there is risk associated with LLM assistants taking action in the real world, while agentic LLMs are also likely to benefit society.


Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding

arXiv.org Artificial Intelligence

Speculative decoding accelerates large language model (LLM) inference by using a smaller draft model to propose tokens, which are then verified by a larger target model. However, selecting an optimal speculation length is critical for maximizing speedup while minimizing wasted computation. We introduce \textit{GammaTune} and \textit{GammaTune+}, training-free adaptive algorithms that dynamically adjust speculation length based on token acceptance rates using a heuristic-based switching mechanism. Evaluated on SpecBench across multiple tasks and model pairs, our method outperforms other heuristic-based approaches and fixed-length speculative decoding, achieving an average speedup of 15\% ($\pm$5\%) with \textit{GammaTune} and 16\% ($\pm$3\%) with \textit{GammaTune+}, while reducing performance variance. This makes \textit{GammaTune} a robust and efficient solution for real-world deployment.


A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials

arXiv.org Machine Learning

In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+\sigma^2}}(\Pi_* X Q_* + \sigma Z)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $\Pi_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $\sigma=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $\sigma=\omega(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(\sigma)$. (3) When $m=d$ and $\sigma=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $\sigma$, and the computational complexity of the testing task.


Prompt Optimization with Logged Bandit Data

arXiv.org Machine Learning

We study how to use naturally available user feedback, such as clicks, to optimize large language model (LLM) pipelines for generating personalized sentences using prompts. Naive approaches, which estimate the policy gradient in the prompt space, suffer either from variance caused by the large action space of prompts or bias caused by inaccurate reward predictions. To circumvent these challenges, we propose a novel kernel-based off-policy gradient method, which estimates the policy gradient by leveraging similarity among generated sentences, substantially reducing variance while suppressing the bias. Empirical results on our newly established suite of benchmarks demonstrate the effectiveness of the proposed approach in generating personalized descriptions for movie recommendations, particularly when the number of candidate prompts is large.


Multi-Modal Framing Analysis of News

arXiv.org Artificial Intelligence

Automated frame analysis of political communication is a popular task in computational social science that is used to study how authors select aspects of a topic to frame its reception. So far, such studies have been narrow, in that they use a fixed set of pre-defined frames and focus only on the text, ignoring the visual contexts in which those texts appear. Especially for framing in the news, this leaves out valuable information about editorial choices, which include not just the written article but also accompanying photographs. To overcome such limitations, we present a method for conducting multi-modal, multi-label framing analysis at scale using large (vision-)language models. Grounding our work in framing theory, we extract latent meaning embedded in images used to convey a certain point and contrast that to the text by comparing the respective frames used. We also identify highly partisan framing of topics with issue-specific frame analysis found in prior qualitative work. We demonstrate a method for doing scalable integrative framing analysis of both text and image in news, providing a more complete picture for understanding media bias.


Graph Network Modeling Techniques for Visualizing Human Mobility Patterns

arXiv.org Machine Learning

Human mobility analysis at urban-scale requires models to represent the complex nature of human movements, which in turn are affected by accessibility to nearby points of interest, underlying socioeconomic factors of a place, and local transport choices for people living in a geographic region. In this work, we represent human mobility and the associated flow of movements as a grapyh. Graph-based approaches for mobility analysis are still in their early stages of adoption and are actively being researched. The challenges of graph-based mobility analysis are multifaceted - the lack of sufficiently high-quality data to represent flows at high spatial and teporal resolution whereas, limited computational resources to translate large voluments of mobility data into a network structure, and scaling issues inherent in graph models etc. The current study develops a methodology by embedding graphs into a continuous space, which alleviates issues related to fast graph matching, graph time-series modeling, and visualization of mobility dynamics. Through experiments, we demonstrate how mobility data collected from taxicab trajectories could be transformed into network structures and patterns of mobility flow changes, and can be used for downstream tasks reporting approx 40% decrease in error on average in matched graphs vs unmatched ones.


ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification

arXiv.org Machine Learning

Surrogate models, crucial for approximating complex simulation data across sciences, inherently carry uncertainties that range from simulation noise to model prediction errors. Without rigorous uncertainty quantification, predictions become unreliable and hence hinder analysis. While methods like Monte Carlo dropout and ensemble models exist, they are often costly, fail to isolate uncertainty types, and lack guaranteed coverage in prediction intervals. To address this, we introduce ConfEviSurrogate, a novel Conformalized Evidential Surrogate Model that can efficiently learn high-order evidential distributions, directly predict simulation outcomes, separate uncertainty sources, and provide prediction intervals. A conformal prediction-based calibration step further enhances interval reliability to ensure coverage and improve efficiency. Our ConfEviSurrogate demonstrates accurate predictions and robust uncertainty estimates in diverse simulations, including cosmology, ocean dynamics, and fluid dynamics.


STOOD-X methodology: using statistical nonparametric test for OOD Detection Large-Scale datasets enhanced with explainability

arXiv.org Machine Learning

Out-of-Distribution (OOD) detection is a critical task in machine learning, particularly in safety-sensitive applications where model failures can have serious consequences. However, current OOD detection methods often suffer from restrictive distributional assumptions, limited scalability, and a lack of interpretability. To address these challenges, we propose STOOD-X, a two-stage methodology that combines a Statistical nonparametric Test for OOD Detection with eXplainability enhancements. In the first stage, STOOD-X uses feature-space distances and a Wilcoxon-Mann-Whitney test to identify OOD samples without assuming a specific feature distribution. In the second stage, it generates user-friendly, concept-based visual explanations that reveal the features driving each decision, aligning with the BLUE XAI paradigm. Through extensive experiments on benchmark datasets and multiple architectures, STOOD-X achieves competitive performance against state-of-the-art post hoc OOD detectors, particularly in high-dimensional and complex settings. In addition, its explainability framework enables human oversight, bias detection, and model debugging, fostering trust and collaboration between humans and AI systems. The STOOD-X methodology therefore offers a robust, explainable, and scalable solution for real-world OOD detection tasks.


IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration

arXiv.org Artificial Intelligence

Image registration is fundamental in medical imaging, enabling precise alignment of anatomical structures for diagnosis, treatment planning, image-guided interventions, and longitudinal monitoring. This work introduces IMPACT (Image Metric with Pretrained model-Agnostic Comparison for Transmodality registration), a novel similarity metric designed for robust multimodal image registration. Rather than relying on raw intensities, handcrafted descriptors, or task-specific training, IMPACT defines a semantic similarity measure based on the comparison of deep features extracted from large-scale pretrained segmentation models. By leveraging representations from models such as TotalSegmentator, Segment Anything (SAM), and other foundation networks, IMPACT provides a task-agnostic, training-free solution that generalizes across imaging modalities. These features, originally trained for segmentation, offer strong spatial correspondence and semantic alignment capabilities, making them naturally suited for registration. The method integrates seamlessly into both algorithmic (Elastix) and learning-based (VoxelMorph) frameworks, leveraging the strengths of each. IMPACT was evaluated on five challenging 3D registration tasks involving thoracic CT/CBCT and pelvic MR/CT datasets. Quantitative metrics, including Target Registration Error and Dice Similarity Coefficient, demonstrated consistent improvements in anatomical alignment over baseline methods. Qualitative analyses further highlighted the robustness of the proposed metric in the presence of noise, artifacts, and modality variations. With its versatility, efficiency, and strong performance across diverse tasks, IMPACT offers a powerful solution for advancing multimodal image registration in both clinical and research settings.


Cognitive Prompts Using Guilford's Structure of Intellect Model

arXiv.org Artificial Intelligence

Large language models (LLMs) demonstrate strong language generation capabilities but often struggle with structured reasoning, leading to inconsistent or suboptimal problem-solving. To mitigate this limitation, Guilford's Structure of Intellect (SOI) model - a foundational framework from intelligence theory - is leveraged as the basis for cognitive prompt engineering. The SOI model categorizes cognitive operations such as pattern recognition, memory retrieval, and evaluation, offering a systematic approach to enhancing LLM reasoning and decision-making. This position paper presents a novel cognitive prompting approach for enforcing SOI-inspired reasoning for improving clarity, coherence, and adaptability in model responses.