Goto

Collaborating Authors

 Zug



From Task Executors to Research Partners: Evaluating AI Co-Pilots Through Workflow Integration in Biomedical Research

Weidener, Lukas, Brkić, Marko, Bacci, Chiara, Jovanović, Mihailo, Ulgac, Emre, Dobrin, Alex, Weniger, Johannes, Vlas, Martin, Singh, Ritvik, Meduri, Aakaash

arXiv.org Artificial Intelligence

Artificial intelligence systems are increasingly deployed in biomedical research. However, current evaluation frameworks may inadequately assess their effectiveness as research collaborators. This rapid review examines benchmarking practices for AI systems in preclinical biomedical research. Three major databases and two preprint servers were searched from January 1, 2018 to October 31, 2025, identifying 14 benchmarks that assess AI capabilities in literature understanding, experimental design, and hypothesis generation. The results revealed that all current benchmarks assess isolated component capabilities, including data analysis quality, hypothesis validity, and experimental protocol design. However, authentic research collaboration requires integrated workflows spanning multiple sessions, with contextual memory, adaptive dialogue, and constraint propagation. This gap implies that systems excelling on component benchmarks may fail as practical research co-pilots. A process-oriented evaluation framework is proposed that addresses four critical dimensions absent from current benchmarks: dialogue quality, workflow orchestration, session continuity, and researcher experience. These dimensions are essential for evaluating AI systems as research co-pilots rather than as isolated task executors.


Human-aligned Quantification of Numerical Data

Kolonin, Anton

arXiv.org Artificial Intelligence

Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as "quantums," which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of "symbols" that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or practical experience, while information theory and computer science offer computable metrics for this purpose. In this study, we assess the applicability of metrics based on information compression and the Silhouette coefficient for quantifying numerical data. We also investigate the extent to which these metrics correlate with one another and with what is commonly referred to as "human intuition." Our findings suggest that the ability to classify numeric data values into distinct categories is associated with a Silhouette coefficient above 0.65 and a Dip Test below 0.5; otherwise, the data can be treated as following a unimodal normal distribution. Furthermore, when quantification is possible, the Silhouette coefficient appears to align more closely with human intuition than the "normalized centroid distance" method derived from information compression perspective.




DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

Neural Information Processing Systems

We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process.


Autoguided Online Data Curation for Diffusion Model Training

Pais, Valeria, Oala, Luis, Faccio, Daniele, Aversa, Marco

arXiv.org Artificial Intelligence

The costs of generative model compute rekindled promises and hopes for efficient data curation. In this work, we investigate whether recently developed autoguidance and online data selection methods can improve the time and sample efficiency of training generative diffusion models. We integrate joint example selection (JEST) and autoguidance into a unified code base for fast ablation and benchmarking. We evaluate combinations of data curation on a controlled 2-D synthetic data generation task as well as (3x64x64)-D image generation. Our comparisons are made at equal wall-clock time and equal number of samples, explicitly accounting for the overhead of selection. Across experiments, autoguidance consistently improves sample quality and diversity. Early AJEST (applying selection only at the beginning of training) can match or modestly exceed autoguidance alone in data efficiency on both tasks. However, its time overhead and added complexity make autoguidance or uniform random data selection preferable in most situations. These findings suggest that while targeted online selection can yield efficiency gains in early training, robust sample quality improvements are primarily driven by autoguidance. We discuss limitations and scope, and outline when data selection may be beneficial.


EconAgentic in DePIN Markets: A Large Language Model Approach to the Sharing Economy of Decentralized Physical Infrastructure

Liu, Yulin, Schweitzer, Mocca

arXiv.org Artificial Intelligence

The Decentralized Physical Infrastructure (DePIN) market is revolutionizing the sharing economy through token-based economics and smart contracts that govern decentralized operations. By 2024, DePIN projects have exceeded \$10 billion in market capitalization, underscoring their rapid growth. However, the unregulated nature of these markets, coupled with the autonomous deployment of AI agents in smart contracts, introduces risks such as inefficiencies and potential misalignment with human values. To address these concerns, we introduce EconAgentic, a Large Language Model (LLM)-powered framework designed to mitigate these challenges. Our research focuses on three key areas: 1) modeling the dynamic evolution of DePIN markets, 2) evaluating stakeholders' actions and their economic impacts, and 3) analyzing macroeconomic indicators to align market outcomes with societal goals. Through EconAgentic, we simulate how AI agents respond to token incentives, invest in infrastructure, and adapt to market conditions, comparing AI-driven decisions with human heuristic benchmarks. Our results show that EconAgentic provides valuable insights into the efficiency, inclusion, and stability of DePIN markets, contributing to both academic understanding and practical improvements in the design and governance of decentralized, tokenized economies.


Contrastive Learning for Efficient Transaction Validation in UTXO-based Blockchains

Attar, Hamid, Lunardon, Luigi, Pagani, Alessio

arXiv.org Artificial Intelligence

This paper introduces a Machine Learning (ML) approach for scalability of UTXO-based blockchains, such as Bitcoin. Prior approaches to UTXO set sharding struggle with distributing UTXOs effectively across validators, creating substantial communication overhead due to child-parent transaction dependencies. This overhead, which arises from the need to locate parent UTXOs, significantly hampers transaction processing speeds. Our solution uses ML to optimize not only UTXO set sharding but also the routing of incoming transactions, ensuring that transactions are directed to shards containing their parent UTXOs. At the heart of our approach is a framework that combines contrastive and unsupervised learning to create an embedding space for transaction outputs. This embedding allows the model to group transaction outputs based on spending relationships, making it possible to route transactions efficiently to the correct validation microservices. Trained on historical transaction data with triplet loss and online semi-hard negative mining, the model embeds parent-child spending patterns directly into its parameters, thus eliminating the need for costly, real-time parent transaction lookups. This significantly reduces cross-shard communication overhead, boosting throughput and scalability.


In-silico biological discovery with large perturbation models

Miladinovic, Djordje, Höppe, Tobias, Chevalley, Mathieu, Georgiou, Andreas, Stuart, Lachlan, Mehrjou, Arash, Bantscheff, Marcus, Schölkopf, Bernhard, Schwab, Patrick

arXiv.org Artificial Intelligence

Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.