AITopics | model composition

Collaborating Authors

model composition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CREPE: Controlling Diffusion with Replica Exchange

He, Jiajun, Jeha, Paul, Potaptchik, Peter, Zhang, Leo, Hernández-Lobato, José Miguel, Du, Yuanqi, Syed, Saifuddin, Vargas, Francisco

arXiv.org Artificial IntelligenceSep-30-2025

Inference-time control of diffusion models aims to steer model outputs to satisfy new constraints without retraining. Previous approaches have mostly relied on heuristic guidance or have been coupled with Sequential Monte Carlo (SMC) for bias correction. In this paper, we propose a flexible alternative based on replica exchange, an algorithm designed initially for sampling problems. We refer to this method as the CREPE (Controlling with REPlica Exchange). Unlike SMC, CREPE: (1) generates particles sequentially, (2) maintains high diversity in the generated samples after a burn-in period, and (3) enables online refinement or early termination. We demonstrate its versatility across various tasks, including temperature annealing, reward-tilting, model composition and classifier-free guidance debiasing, with competitive performance compared to prior SMC methods.

artificial intelligence, crepe, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.23265

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Model Composition for Multimodal Large Language Models

Chen, Chi, Du, Yiyang, Fang, Zheng, Wang, Ziyue, Luo, Fuwen, Li, Peng, Yan, Ming, Zhang, Ji, Huang, Fei, Sun, Maosong, Liu, Yang

arXiv.org Artificial IntelligenceFeb-20-2024

Recent developments in Multimodal Large Language Models (MLLMs) have shown rapid progress, moving towards the goal of creating versatile MLLMs that understand inputs from various modalities. However, existing methods typically rely on joint training with paired multimodal instruction data, which is resource-intensive and challenging to extend to new modalities. In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model. Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters. Furthermore, we introduce DAMC to address parameter interference and mismatch issues during the merging process, thereby enhancing the model performance. To facilitate research in this area, we propose MCUB, a benchmark for assessing ability of MLLMs to understand inputs from diverse modalities. Experiments on this benchmark and four other multimodal understanding tasks show significant improvements over baselines, proving that model composition can create a versatile model capable of processing inputs from multiple modalities.

composition, mllm, modality, (12 more...)

arXiv.org Artificial Intelligence

2402.1275

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Tangent Model Composition for Ensembling and Continual Fine-tuning

Liu, Tian Yu, Soatto, Stefano

arXiv.org Artificial IntelligenceSep-29-2023

The computational architecture of Transformers [52] has been leveraged extensively to co-opt Tangent Model Composition (TMC) is a method to combine the compositional structure of data through prompts or tokens component models independently fine-tuned around [29, 55], but still the activations of trained models do a pre-trained point. Component models are tangent vectors not appear to be meaningfully composable. Compositionality to the pre-trained model that can be added, scaled, of neural activity would allow one to combine activations or subtracted to support incremental learning, ensembling, from different models to capture novel concepts, or or unlearning. Component models are composed at inference incorporate knowledge from different data without having time via scalar combination, reducing the cost of ensembling to re-train or fine-tune the core models. This would enable to that of a single model. TMC improves accuracy open-universe classification and, more generally, combinatorial by 4.2% compared to ensembling non-linearly finetuned expansion of the hypothesis space. Continual learning models at a 2.5 to 10 reduction of inference cost, could be performed simply by composing models trained on growing linearly with the number of component models.

component model, continual learning, learning, (11 more...)

arXiv.org Artificial Intelligence

2307.08114

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.97)

Add feedback

Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

Banitalebi-Dehkordi, Amin, Kang, Xinyu, Zhang, Yong

arXiv.org Artificial IntelligenceOct-20-2021

The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves +18.6%, +12.6%, and +8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.

architecture, category, input model, (15 more...)

arXiv.org Artificial Intelligence

2110.10369

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

MLJ: A Julia package for composable machine learning

Blaom, Anthony D., Kiraly, Franz, Lienart, Thibaut, Simillides, Yiannis, Arenas, Diego, Vollmer, Sebastian J.

arXiv.org Machine LearningNov-3-2020

Statistical modeling, and the building of complex modeling pipelines, is a cornerstone of modern data science. Most experienced data scientists rely on high-level open source modeling toolboxes - such as sckit-learn [1]; [2] (Python); Weka [3] (Java); mlr [4] and caret [5] (R) - for quick blueprinting, testing, and creation of deployment-ready models. They do this by providing a common interface to atomic components, from an ever-growing model zoo, and by providing the means to incorporate these into complex workflows. Practitioners are wanting to build increasingly sophisticated composite models, as exemplified in the strategies of top contestants in machine learning competitions such as Kaggle. MLJ (Machine Learning in Julia) [18] is a toolbox written in Julia that provides a common interface and meta-algorithms for selecting, tuning, evaluating, composing and comparing machine model implementations written in Julia and other languages. More broadly, the MLJ project hopes to bring cohesion and focus to a number of emerging and existing, but previously disconnected, machine learning algorithms and tools of high quality, written in Julia. A welcome corollary of this activity will be increased cohesion and synergy within the talent-rich communities developing these tools. In addition to other novelties outlined below, MLJ aims to provide first-in-class model composition capabilities. Guiding goals of the MLJ project have been usability, interoperability, extensibility, code transparency, and reproducibility.

mlj, model composition, scientific type, (16 more...)

arXiv.org Machine Learning

2007.12285

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States (0.04)
Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Environmental Modeling Framework using Stacked Gaussian Processes

Abdelfatah, Kareem, Bao, Junshu, Terejanu, Gabriel

arXiv.org Machine LearningJun-18-2017

A network of independently trained Gaussian processes (StackedGP) is introduced to obtain predictions of quantities of interest with quantified uncertainties. The main applications of the StackedGP framework are to integrate different datasets through model composition, enhance predictions of quantities of interest through a cascade of intermediate predictions, and to propagate uncertainties through emulated dynamical systems driven by uncertain forcing variables. By using analytical first and second-order moments of a Gaussian process with uncertain inputs using squared exponential and polynomial kernels, approximated expectations of quantities of interests that require an arbitrary composition of functions can be obtained. The StackedGP model is extended to any number of layers and nodes per layer, and it provides flexibility in kernel selection for the input nodes. The proposed nonparametric stacked model is validated using synthetic datasets, and its performance in model composition and cascading predictions is measured in two applications using real data.

data mining, machine learning, stackedgp, (19 more...)

arXiv.org Machine Learning

1612.02897

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry:

Government (0.46)
Food & Agriculture (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Robustness, Adaptivity, and Resiliency Analysis

Bankes, Steven Carl (BAE Systems)

AAAI ConferencesNov-5-2010

In order to better understand the mechanisms that lead to resiliency in natural systems, to support decisions that lead to greater resiliency in systems we effect, and to create models that will utilized in highly resilient systems, methods for resiliency analysis will be required. Existing methods and technology for robustness analysis provide a foundation for a rigorous approach to resiliency analysis, but extensions are necessary to address the multiple time scales that must be modeled to understand highly adaptive systems. Further, if resiliency modeling is to be effective, it must be contextualized, requiring that the supporting software will need to mirror the systems being modeling by being pace layered and adaptive.

modeling, resiliency, resiliency analysis, (14 more...)

AAAI Conferences

2010 AAAI Fall Symposium Series

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Hawaii (0.04)
North America > United States > District of Columbia > Washington (0.04)

Industry: Government > Military (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback