Search
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Fan, Jigang, Zhou, Zhenghong, Jin, Ruofan, Cong, Le, Wang, Mengdi, Zhang, Zaixi
Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at https://github.com/jigang-fan/SafeProtein.
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search
Hu, Yunhai, Zhao, Yilun, Zhao, Chen, Cohan, Arman
We introduce MCTS-RAG, a novel approach that enhances the reasoning capabilities of small language models on knowledge-intensive tasks by leveraging retrieval-augmented generation (RAG) to provide relevant context and Monte Carlo Tree Search (MCTS) to refine reasoning paths. MCTS-RAG dynamically integrates retrieval and reasoning through an iterative decision-making process. Unlike standard RAG methods, which typically retrieve information independently from reasoning and thus integrate knowledge suboptimally, or conventional MCTS reasoning, which depends solely on internal model knowledge without external facts, MCTS-RAG combines structured reasoning with adaptive retrieval. This integrated approach enhances decision-making, reduces hallucinations, and ensures improved factual accuracy and response consistency. The experimental results on multiple reasoning and knowledge-intensive datasets datasets (i.e., ComplexWebQA, GPQA, and FoolMeTwice) show that our method enables small-scale LMs to achieve performance comparable to frontier LLMs like GPT-4o by effectively scaling inference-time compute, setting a new standard for reasoning in small-scale models.
A Multi-Agent Framework for Stateful Inference-Time Search
Lalan, Arshika, Ghosh, Rajat, Kolsur, Aditya, Dutta, Debojyoti
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framework that departs from prior stateless approaches by combining (i) persistent inference-time state, (ii) adversarial mutation, and (iii) evolutionary preservation. We demonstrate its effectiveness in automated unit test generation through the generation of edge cases. We generate robust edge cases using an evolutionary search process, where specialized agents sequentially propose, mutate, and score candidates. A controller maintains persistent state across generations, while evolutionary preservation ensures diversity and exploration across all possible cases. This yields a generalist agent capable of discovering robust, high-coverage edge cases across unseen codebases. Experiments show our stateful multi-agent inference framework achieves substantial gains in coverage over stateless single-step baselines, evaluated on prevalent unit-testing benchmarks such as HumanEval and TestGenEvalMini and using three diverse LLM families - Llama, Gemma, and GPT. These results indicate that combining persistent inference-time state with evolutionary search materially improves unit-test generation.
Quantum Sparse Recovery and Quantum Orthogonal Matching Pursuit
Bellante, Armando, Vanerio, Stefano, Zanero, Stefano
We study quantum sparse recovery in non-orthogonal, overcomplete dictionaries: given coherent quantum access to a state and a dictionary of vectors, the goal is to reconstruct the state up to $\ell_2$ error using as few vectors as possible. We first show that the general recovery problem is NP-hard, ruling out efficient exact algorithms in full generality. To overcome this, we introduce Quantum Orthogonal Matching Pursuit (QOMP), the first quantum analogue of the classical OMP greedy algorithm. QOMP combines quantum subroutines for inner product estimation, maximum finding, and block-encoded projections with an error-resetting design that avoids iteration-to-iteration error accumulation. Under standard mutual incoherence and well-conditioned sparsity assumptions, QOMP provably recovers the exact support of a $K$-sparse state in polynomial time. As an application, we give the first framework for sparse quantum tomography with non-orthogonal dictionaries in $\ell_2$ norm, achieving query complexity $\widetilde{O}(\sqrt{N}/ε)$ in favorable regimes and reducing tomography to estimating only $K$ coefficients instead of $N$ amplitudes. In particular, for pure-state tomography with $m=O(N)$ dictionary vectors and sparsity $K=\widetilde{O}(1)$ on a well-conditioned subdictionary, this circumvents the $\widetildeΩ(N/ε)$ lower bound that holds in the dense, orthonormal-dictionary setting, without contradiction, by leveraging sparsity together with non-orthogonality. Beyond tomography, we analyze QOMP in the QRAM model, where it yields polynomial speedups over classical OMP implementations, and provide a quantum algorithm to estimate the mutual incoherence of a dictionary of $m$ vectors in $O(m/ε)$ queries, improving over both deterministic and quantum-inspired classical methods.
Visualizing Multimodality in Combinatorial Search Landscapes
Sánchez-Díaz, Xavier F. C., Mengshoel, Ole Jakob
This work walks through different visualization techniques for combinatorial search landscapes, focusing on multimodality. We discuss different techniques from the landscape analysis literature, and how they can be combined to provide a more comprehensive view of the search landscape. We also include examples and discuss relevant work to show how others have used these techniques in practice, based on the geometric and aesthetic elements of the Grammar of Graphics. We conclude that there is no free lunch in visualization, and provide recommendations for future work as there are several paths to continue the work in this field.