TRINITY: An Evolved LLM Coordinator

Xu, Jinglue, Sun, Qi, Schwendeman, Peter, Nielsen, Stefan, Cetin, Edoardo, Tang, Yujin

arXiv.org Artificial Intelligence 

Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. The coordinator, comprising a compact language model ( 0.6B parameters) and a lightweight head ( 10K parameters), is optimized with an evolutionary strategy for efficient and adaptive delegation. Theoretical and empirical analyses highlight two key factors driving this success: (1) the coordinator's hidden-state representations provide rich contextualization of inputs, and (2) under high dimensionality and strict budget constraints, the separable Covariance Matrix Adaptation Evolution Strategy algorithm provides substantial advantages over RL, imitation learning, and random search, leveraging potential block-ε-separability. A prominent line of work involving large language models (LLMs) aspires to scale in line with empirical scaling laws, targeting gains by enlarging model size, training tokens, and compute (Kaplan et al., 2020; Hoffmann et al., 2022). Y et the extent to which such scaling remains efficient and yields sustained returns is uncertain and often resource intensive. An alternative at the micro level is model merging (Akiba et al., 2025; Wortsman et al., 2022; Y ang et al., 2024; Kuroki et al., 2024), which seeks parameter-level integration. However, this approach is frequently impractical due to architectural incompatibilities and the closed-source nature of many high-performing models. In light of these limitations, we adopt a macro-level approach: test-time model composition via coordination, which fuses the complementary strengths of multiple state-of-the-art models from diverse providers without modifying their weights. Leveraging prior data and training investments, this coordination can deliver performance improvements without retraining individual models. The central challenge for such a coordinator is to acquire a rich contextual understanding of a given query to make an effective decision. We posit that this signal can be efficiently extracted from the internal representation of a compact language model, specifically, its hidden states (Allen-Zhu & Li, 2023). In a self-attention-based transformer model, hidden states encode contextual representations of the input (and, after generation, the output) sequence.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found