AITopics

arXiv.org Artificial IntelligenceMar-27-2025

Model Assembly Learning with Heterogeneous Layer Weight Merging

Zhang, Yi-Kai, Wang, Jin, Zhong, Xu-Xiang, Zhan, De-Chuan, Ye, Han-Jia

Model merging acquires general capabilities without extra data or training by combining multiple models' parameters. Previous approaches achieve linear mode connectivity by aligning parameters into the same loss basin using permutation invariance. In this paper, we introduce Model Assembly Learning (MAL), a novel paradigm for model merging that iteratively integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities. Unlike previous works that require identical architectures, MAL allows the merging of heterogeneous architectures and selective parameters across layers. Specifically, the base model can incorporate parameters from different layers of multiple pre-trained models. We systematically investigate the conditions and fundamental settings of heterogeneous parameter merging, addressing all possible mismatches in layer widths between the base and target models. Furthermore, we establish key laws and provide practical guidelines for effectively implementing MAL.

architecture, artificial intelligence, machine learning, (11 more...)

2503.21657

Genre:

Research Report (0.40)
Collection (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Jin, Weina, Vincent, Nicholas, Hamarneh, Ghassan

AI for Just Work: Constructing Diverse Imaginations of AI beyond "Replacing Humans"

arXiv.org Artificial IntelligenceMar-10-2025

The AI community usually focuses on "how" to develop AI techniques, but lacks thorough open discussions on "why" we develop AI. Lacking critical reflections on the general visions and purposes of AI may make the community vulnerable to manipulation. In this position paper, we explore the "why" question of AI. We denote answers to the "why" question the imaginations of AI, which depict our general visions, frames, and mindsets for the prospects of AI. We identify that the prevailing vision in the AI community is largely a monoculture that emphasizes objectives such as replacing humans and improving productivity. Our critical examination of this mainstream imagination highlights its underpinning and potentially unjust assumptions. We then call to diversify our collective imaginations of AI, embedding ethical assumptions from the outset in the imaginations of AI. To facilitate the community's pursuit of diverse imaginations, we demonstrate one process for constructing a new imagination of "AI for just work," and showcase its application in the medical image synthesis task to make it more ethical. We hope this work will help the AI community to open dialogues with civil society on the visions and purposes of AI, and inspire more technical works and advocacy in pursuit of diverse and ethical imaginations to restore the value of AI for the public good.

artificial intelligence, machine learning, natural language, (17 more...)

2503.0872

Country:

Europe > United Kingdom > England (0.46)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:

Collection (0.93)
Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceFeb-20-2025

Natural Language Generation

Reiter, Ehud

This book provides a broad overview of Natural Language Generation (NLG), including technology, user requirements, evaluation, and real-world applications. The focus is on concepts and insights which hopefully will remain relevant for many years, not on the latest LLM innovations. It draws on decades of work by the author and others on NLG. The book has the following chapters: Introduction to NLG; Rule-Based NLG; Machine Learning and Neural NLG; Requirements; Evaluation; Safety, Maintenance, and Testing; and Applications. All chapters include examples and anecdotes from the author's personal experiences, and end with a Further Reading section. The book should be especially useful to people working on applied NLG, including NLG researchers, people in other fields who want to use NLG, and commercial developers. It will not however be useful to people who want to understand the latest LLM technology. There is a companion site with more information at https://ehudreiter.com/book/

large language model, machine learning, natural language, (21 more...)

doi: 10.1007/978-3-031-68582-8

2502.14437

Country:

Asia (1.00)
North America > Canada (0.67)
Europe > United Kingdom > England (0.45)
(4 more...)

Genre:

Summary/Review (1.00)
Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
(5 more...)

Industry:

Transportation > Air (1.00)
Media > News (1.00)
Leisure & Entertainment > Sports > Basketball (1.00)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-10-2025, 20:16:09 GMT

Table of Contents of Appendix A Detailed Related Work 17 B Limitations and Potential Negative Societal Impacts 18 C Discussions and Details about Experiments in Figure 1 18 C.1 Summary

Distance-based methods are based on the assumption that OOD data should be relatively far away from the centroids of ID classes [9], including Mahalanobis distance [9, 45], cosine similarity [74], and kernel similarity [75]. Early works consider using the maximum softmax score to express the ID-ness [7]. Then, temperature scaling functions are used to amplify the separation between the ID and OOD data [14]. Recently, researchers propose hyperparameter-free energy scores to improve the OOD uncertainty estimation [23, 71]. Additionally, researchers also consider using the information contained in gradients to help improve the performance of OOD detection [18].

artificial intelligence, machine learning, ood detection, (15 more...)

Genre: Collection (0.40)

Industry: Social Sector (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Neural Information Processing SystemsJan-26-2025, 01:36:08 GMT

Appendix Table of contents

A Empirical measurement of SCN's equivariance to rotations B OC20 IS2RE Direct Results C Implementation details D Note on spherical harmonics properties E Overfitting on the training dataset F Impact of model size The SCN is not strictly equivariant to rotations, but depending on the design choices approximate equivarance may be achieved. We begin by empirically measuring the network's invariance and equivariance to rotation for energy and forces respectively. Mean Absolute Difference (MAD) results for various model choices are shown in Table 3 for models with 12 layers and L = 6. Differences are averaged over a model's outputs for a random 1,000 atomic structures. There are four sources that may lead the network to predict different values for rotated versions of the input: 1) the use of m 0 coefficients during message passing, 2) non-linear message aggregation, Equation (3), 3) the energy and force output blocks, Equations (4,5), and 4) limits to numerical precision especially when using Automatic Mixed Precision (AMP).

artificial intelligence, equivariant, machine learning, (19 more...)

Genre: Collection (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-24-2025, 18:18:20 GMT

Appendix - An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild Table of Contents

We use the images at 256 256 resolution. We follow [21] and use all the images for training. The images used for the qualitative visualizations contain random images from the web and samples from CelebA-HQ. AFHQ [8] 15, 000 high quality images categorized into three domains: cat, dog and wildlife. We use the images at 128 128 resolution, holding out 500 images from each domain for testing.

architecture, artificial intelligence, machine learning, (16 more...)

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Collection (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-22-2025, 08:55:34 GMT

Supplementary Material for " Neural Auto-Curricula in Two-player Zero-sum Games " Table of Contents 15 A.1 MLP-based Meta-Solver 15 A.2 Conv1D-based Meta-Solver

In this section, we recap the meta-solver properties that we need and illustrate how we designed models to achieve them. There exist two properties the model should have. The model should handle a variable-length matrix input. The model should be subject to row-permutation equivariance and column-permutation invariance. Three different techniques can be utilised to achieve the first property, which also corresponds to the three different models we propose: MLP based, Conv1d based and GRU based model. If not specifically mentioned, we utilise ReLU as the activation function for all MLP used in our meta-solver.

artificial intelligence, machine learning, umber, (17 more...)

Genre: Collection (0.40)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Neural Information Processing SystemsJan-21-2025, 05:37:56 GMT

Appendix for " Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games " Table of Contents 1 A.1 Proof of Theorem 1

A.1 Proof of Theorem 1 To prove Theorem 1, we need the help of the following Lemma Lemma 1. See Proposition 7.1 in [3]. Now we can prove our Theorem 1. Proof. For games with only one step (normal-form games, functional-form games), there is only one fixed state. Therefore, the distribution of state-action is equivalent to the distribution of the action. A.2 Proof of Theorem 2 Let us restate our Theorem 2 Theorem 2. For a given empirical payoff matrix A R A.3 Proof of Theorem 3 Now let us first restate the propositions.

artificial intelligence, iteration, machine learning, (12 more...)

Genre: Collection (0.40)

Industry: Leisure & Entertainment > Sports (0.69)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Mashkova, Olga, Zhapa-Camacho, Fernando, Hoehndorf, Robert

DELE: Deductive $\mathcal{EL}^{++} \thinspace $ Embeddings for Knowledge Base Completion

arXiv.org Artificial IntelligenceNov-3-2024

Ontology embeddings map classes, relations, and individuals in ontologies into $\mathbb{R}^n$, and within $\mathbb{R}^n$ similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic $\mathcal{EL}^{++}$, several embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for $\mathcal{EL}^{++}$ ontologies, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives and formulated evaluation methods for knowledge base completion. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.

artificial intelligence, axiom, expert system, (18 more...)