Rodrigues, Pedro
AIDetx: a compression-based method for identification of machine-learning generated text
Almeida, Leonardo, Rodrigues, Pedro, Magalhães, Diogo, Pinho, Armando J., Pratas, Diogo
This paper introduces AIDetx, a novel method for detecting machine-generated text using data compression techniques. Traditional approaches, such as deep learning classifiers, often suffer from high computational costs and limited interpretability. To address these limitations, we propose a compression-based classification framework that leverages finite-context models (FCMs). AIDetx constructs distinct compression models for human-written and AI-generated text, classifying new inputs based on which model achieves a higher compression ratio. We evaluated AIDetx on two benchmark datasets, achieving F1 scores exceeding 97% and 99%, respectively, highlighting its high accuracy. Compared to current methods, such as large language models (LLMs), AIDetx offers a more interpretable and computationally efficient solution, significantly reducing both training time and hardware requirements (e.g., no GPUs needed). The full implementation is publicly available at https://github.com/AIDetx/AIDetx.
sbi reloaded: a toolkit for simulation-based inference workflows
Boelts, Jan, Deistler, Michael, Gloeckler, Manuel, Tejero-Cantero, Álvaro, Lueckmann, Jan-Matthis, Moss, Guy, Steinbach, Peter, Moreau, Thomas, Muratore, Fabio, Linhart, Julia, Durkan, Conor, Vetter, Julius, Miller, Benjamin Kurt, Herold, Maternus, Ziaeemehr, Abolfazl, Pals, Matthijs, Gruner, Theo, Bischoff, Sebastian, Krouglova, Nastya, Gao, Richard, Lappalainen, Janne K., Mucsányi, Bálint, Pei, Felix, Schulz, Auguste, Stefanidi, Zinovia, Rodrigues, Pedro, Schröder, Cornelius, Zaid, Faried Abu, Beck, Jonas, Kapoor, Jaivardhan, Greenberg, David S., Gonçalves, Pedro J., Macke, Jakob H.
Scientists and engineers use simulators to model empirically observed phenomena. However, tuning the parameters of a simulator to ensure its outputs match observed data presents a significant challenge. Simulation-based inference (SBI) addresses this by enabling Bayesian inference for simulators, identifying parameters that match observed data and align with prior knowledge. Unlike traditional Bayesian inference, SBI only needs access to simulations from the model and does not require evaluations of the likelihood-function. In addition, SBI algorithms do not require gradients through the simulator, allow for massive parallelization of simulations, and can perform inference for different observations without further simulations or training, thereby amortizing inference. Over the past years, we have developed, maintained, and extended $\texttt{sbi}$, a PyTorch-based package that implements Bayesian SBI algorithms based on neural networks. The $\texttt{sbi}$ toolkit implements a wide range of inference methods, neural network architectures, sampling methods, and diagnostic tools. In addition, it provides well-tested default settings but also offers flexibility to fully customize every step of the simulation-based inference workflow. Taken together, the $\texttt{sbi}$ toolkit enables scientists and engineers to apply state-of-the-art SBI methods to black-box simulators, opening up new possibilities for aligning simulations with empirically observed data.