Goto

Collaborating Authors

 model composition


CREPE: Controlling Diffusion with Replica Exchange

He, Jiajun, Jeha, Paul, Potaptchik, Peter, Zhang, Leo, Hernández-Lobato, José Miguel, Du, Yuanqi, Syed, Saifuddin, Vargas, Francisco

arXiv.org Artificial Intelligence

Inference-time control of diffusion models aims to steer model outputs to satisfy new constraints without retraining. Previous approaches have mostly relied on heuristic guidance or have been coupled with Sequential Monte Carlo (SMC) for bias correction. In this paper, we propose a flexible alternative based on replica exchange, an algorithm designed initially for sampling problems. We refer to this method as the CREPE (Controlling with REPlica Exchange). Unlike SMC, CREPE: (1) generates particles sequentially, (2) maintains high diversity in the generated samples after a burn-in period, and (3) enables online refinement or early termination. We demonstrate its versatility across various tasks, including temperature annealing, reward-tilting, model composition and classifier-free guidance debiasing, with competitive performance compared to prior SMC methods.


Model Composition for Multimodal Large Language Models

Chen, Chi, Du, Yiyang, Fang, Zheng, Wang, Ziyue, Luo, Fuwen, Li, Peng, Yan, Ming, Zhang, Ji, Huang, Fei, Sun, Maosong, Liu, Yang

arXiv.org Artificial Intelligence

Recent developments in Multimodal Large Language Models (MLLMs) have shown rapid progress, moving towards the goal of creating versatile MLLMs that understand inputs from various modalities. However, existing methods typically rely on joint training with paired multimodal instruction data, which is resource-intensive and challenging to extend to new modalities. In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters. Furthermore, we introduce DAMC to address parameter interference and mismatch issues during the merging process, thereby enhancing the model performance. To facilitate research in this area, we propose MCUB, a benchmark for assessing ability of MLLMs to understand inputs from diverse modalities. Experiments on this benchmark and four other multimodal understanding tasks show significant improvements over baselines, proving that model composition can create a versatile model capable of processing inputs from multiple modalities.


Tangent Model Composition for Ensembling and Continual Fine-tuning

Liu, Tian Yu, Soatto, Stefano

arXiv.org Artificial Intelligence

The computational architecture of Transformers [52] has been leveraged extensively to co-opt Tangent Model Composition (TMC) is a method to combine the compositional structure of data through prompts or tokens component models independently fine-tuned around [29, 55], but still the activations of trained models do a pre-trained point. Component models are tangent vectors not appear to be meaningfully composable. Compositionality to the pre-trained model that can be added, scaled, of neural activity would allow one to combine activations or subtracted to support incremental learning, ensembling, from different models to capture novel concepts, or or unlearning. Component models are composed at inference incorporate knowledge from different data without having time via scalar combination, reducing the cost of ensembling to re-train or fine-tune the core models. This would enable to that of a single model. TMC improves accuracy open-universe classification and, more generally, combinatorial by 4.2% compared to ensembling non-linearly finetuned expansion of the hypothesis space. Continual learning models at a 2.5 to 10 reduction of inference cost, could be performed simply by composing models trained on growing linearly with the number of component models.


Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

Banitalebi-Dehkordi, Amin, Kang, Xinyu, Zhang, Yong

arXiv.org Artificial Intelligence

The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves +18.6%, +12.6%, and +8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.


MLJ: A Julia package for composable machine learning

Blaom, Anthony D., Kiraly, Franz, Lienart, Thibaut, Simillides, Yiannis, Arenas, Diego, Vollmer, Sebastian J.

arXiv.org Machine Learning

Statistical modeling, and the building of complex modeling pipelines, is a cornerstone of modern data science. Most experienced data scientists rely on high-level open source modeling toolboxes - such as sckit-learn [1]; [2] (Python); Weka [3] (Java); mlr [4] and caret [5] (R) - for quick blueprinting, testing, and creation of deployment-ready models. They do this by providing a common interface to atomic components, from an ever-growing model zoo, and by providing the means to incorporate these into complex workflows. Practitioners are wanting to build increasingly sophisticated composite models, as exemplified in the strategies of top contestants in machine learning competitions such as Kaggle. MLJ (Machine Learning in Julia) [18] is a toolbox written in Julia that provides a common interface and meta-algorithms for selecting, tuning, evaluating, composing and comparing machine model implementations written in Julia and other languages. More broadly, the MLJ project hopes to bring cohesion and focus to a number of emerging and existing, but previously disconnected, machine learning algorithms and tools of high quality, written in Julia. A welcome corollary of this activity will be increased cohesion and synergy within the talent-rich communities developing these tools. In addition to other novelties outlined below, MLJ aims to provide first-in-class model composition capabilities. Guiding goals of the MLJ project have been usability, interoperability, extensibility, code transparency, and reproducibility.


Environmental Modeling Framework using Stacked Gaussian Processes

Abdelfatah, Kareem, Bao, Junshu, Terejanu, Gabriel

arXiv.org Machine Learning

A network of independently trained Gaussian processes (StackedGP) is introduced to obtain predictions of quantities of interest with quantified uncertainties. The main applications of the StackedGP framework are to integrate different datasets through model composition, enhance predictions of quantities of interest through a cascade of intermediate predictions, and to propagate uncertainties through emulated dynamical systems driven by uncertain forcing variables. By using analytical first and second-order moments of a Gaussian process with uncertain inputs using squared exponential and polynomial kernels, approximated expectations of quantities of interests that require an arbitrary composition of functions can be obtained. The StackedGP model is extended to any number of layers and nodes per layer, and it provides flexibility in kernel selection for the input nodes. The proposed nonparametric stacked model is validated using synthetic datasets, and its performance in model composition and cascading predictions is measured in two applications using real data.


Robustness, Adaptivity, and Resiliency Analysis

Bankes, Steven Carl (BAE Systems)

AAAI Conferences

In order to better understand the mechanisms that lead to resiliency in natural systems, to support decisions that lead to greater resiliency in systems we effect, and to create models that will utilized in highly resilient systems, methods for resiliency analysis will be required. Existing methods and technology for robustness analysis provide a foundation for a rigorous approach to resiliency analysis, but extensions are necessary to address the multiple time scales that must be modeled to understand highly adaptive systems. Further, if resiliency modeling is to be effective, it must be contextualized, requiring that the supporting software will need to mirror the systems being modeling by being pace layered and adaptive.