Collection
#AAAI2025 workshops round-up 3: Neural reasoning and mathematical discovery, and AI to accelerate science and engineering
In this series of articles, we're publishing summaries with some of the key takeaways from a few of the workshops held at the 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025). Recent progress in Sphere Neural Networks demonstrates various possibilities for neural networks to achieve symbolic-level reasoning. This workshop aimed to reconsider various problems and discuss walk-round solutions in the two-way street commingling of neural networks and mathematics. This workshop brought together researchers from artificial intelligence and diverse scientific domains to address new challenges towards accelerating scientific discovery and engineering design. This was the fourth iteration of the workshop, with the theme of AI for biological sciences following previous three years' themes of AI for chemistry, earth sciences, and materials/manufacturing respectively.
Model Assembly Learning with Heterogeneous Layer Weight Merging
Zhang, Yi-Kai, Wang, Jin, Zhong, Xu-Xiang, Zhan, De-Chuan, Ye, Han-Jia
Model merging acquires general capabilities without extra data or training by combining multiple models' parameters. Previous approaches achieve linear mode connectivity by aligning parameters into the same loss basin using permutation invariance. In this paper, we introduce Model Assembly Learning (MAL), a novel paradigm for model merging that iteratively integrates parameters from diverse models in an open-ended model zoo to enhance the base model's capabilities. Unlike previous works that require identical architectures, MAL allows the merging of heterogeneous architectures and selective parameters across layers. Specifically, the base model can incorporate parameters from different layers of multiple pre-trained models. We systematically investigate the conditions and fundamental settings of heterogeneous parameter merging, addressing all possible mismatches in layer widths between the base and target models. Furthermore, we establish key laws and provide practical guidelines for effectively implementing MAL.
AI for Just Work: Constructing Diverse Imaginations of AI beyond "Replacing Humans"
Jin, Weina, Vincent, Nicholas, Hamarneh, Ghassan
The AI community usually focuses on "how" to develop AI techniques, but lacks thorough open discussions on "why" we develop AI. Lacking critical reflections on the general visions and purposes of AI may make the community vulnerable to manipulation. In this position paper, we explore the "why" question of AI. We denote answers to the "why" question the imaginations of AI, which depict our general visions, frames, and mindsets for the prospects of AI. We identify that the prevailing vision in the AI community is largely a monoculture that emphasizes objectives such as replacing humans and improving productivity. Our critical examination of this mainstream imagination highlights its underpinning and potentially unjust assumptions. We then call to diversify our collective imaginations of AI, embedding ethical assumptions from the outset in the imaginations of AI. To facilitate the community's pursuit of diverse imaginations, we demonstrate one process for constructing a new imagination of "AI for just work," and showcase its application in the medical image synthesis task to make it more ethical. We hope this work will help the AI community to open dialogues with civil society on the visions and purposes of AI, and inspire more technical works and advocacy in pursuit of diverse and ethical imaginations to restore the value of AI for the public good.
Natural Language Generation
This book provides a broad overview of Natural Language Generation (NLG), including technology, user requirements, evaluation, and real-world applications. The focus is on concepts and insights which hopefully will remain relevant for many years, not on the latest LLM innovations. It draws on decades of work by the author and others on NLG. The book has the following chapters: Introduction to NLG; Rule-Based NLG; Machine Learning and Neural NLG; Requirements; Evaluation; Safety, Maintenance, and Testing; and Applications. All chapters include examples and anecdotes from the author's personal experiences, and end with a Further Reading section. The book should be especially useful to people working on applied NLG, including NLG researchers, people in other fields who want to use NLG, and commercial developers. It will not however be useful to people who want to understand the latest LLM technology. There is a companion site with more information at https://ehudreiter.com/book/
Table of Contents of Appendix A Detailed Related Work 17 B Limitations and Potential Negative Societal Impacts 18 C Discussions and Details about Experiments in Figure 1 18 C.1 Summary
Distance-based methods are based on the assumption that OOD data should be relatively far away from the centroids of ID classes [9], including Mahalanobis distance [9, 45], cosine similarity [74], and kernel similarity [75]. Early works consider using the maximum softmax score to express the ID-ness [7]. Then, temperature scaling functions are used to amplify the separation between the ID and OOD data [14]. Recently, researchers propose hyperparameter-free energy scores to improve the OOD uncertainty estimation [23, 71]. Additionally, researchers also consider using the information contained in gradients to help improve the performance of OOD detection [18].
Appendix Table of contents
A Empirical measurement of SCN's equivariance to rotations B OC20 IS2RE Direct Results C Implementation details D Note on spherical harmonics properties E Overfitting on the training dataset F Impact of model size The SCN is not strictly equivariant to rotations, but depending on the design choices approximate equivarance may be achieved. We begin by empirically measuring the network's invariance and equivariance to rotation for energy and forces respectively. Mean Absolute Difference (MAD) results for various model choices are shown in Table 3 for models with 12 layers and L = 6. Differences are averaged over a model's outputs for a random 1,000 atomic structures. There are four sources that may lead the network to predict different values for rotated versions of the input: 1) the use of m 0 coefficients during message passing, 2) non-linear message aggregation, Equation (3), 3) the energy and force output blocks, Equations (4,5), and 4) limits to numerical precision especially when using Automatic Mixed Precision (AMP).
Appendix - An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild Table of Contents
We use the images at 256 256 resolution. We follow [21] and use all the images for training. The images used for the qualitative visualizations contain random images from the web and samples from CelebA-HQ. AFHQ [8] 15, 000 high quality images categorized into three domains: cat, dog and wildlife. We use the images at 128 128 resolution, holding out 500 images from each domain for testing.
Supplementary Material for " Neural Auto-Curricula in Two-player Zero-sum Games " Table of Contents 15 A.1 MLP-based Meta-Solver 15 A.2 Conv1D-based Meta-Solver
In this section, we recap the meta-solver properties that we need and illustrate how we designed models to achieve them. There exist two properties the model should have. The model should handle a variable-length matrix input. The model should be subject to row-permutation equivariance and column-permutation invariance. Three different techniques can be utilised to achieve the first property, which also corresponds to the three different models we propose: MLP based, Conv1d based and GRU based model. If not specifically mentioned, we utilise ReLU as the activation function for all MLP used in our meta-solver.
Appendix for " Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games " Table of Contents 1 A.1 Proof of Theorem 1
A.1 Proof of Theorem 1 To prove Theorem 1, we need the help of the following Lemma Lemma 1. See Proposition 7.1 in [3]. Now we can prove our Theorem 1. Proof. For games with only one step (normal-form games, functional-form games), there is only one fixed state. Therefore, the distribution of state-action is equivalent to the distribution of the action. A.2 Proof of Theorem 2 Let us restate our Theorem 2 Theorem 2. For a given empirical payoff matrix A R A.3 Proof of Theorem 3 Now let us first restate the propositions.
DELE: Deductive $\mathcal{EL}^{++} \thinspace $ Embeddings for Knowledge Base Completion
Mashkova, Olga, Zhapa-Camacho, Fernando, Hoehndorf, Robert
Ontology embeddings map classes, relations, and individuals in ontologies into $\mathbb{R}^n$, and within $\mathbb{R}^n$ similarity between entities can be computed or new axioms inferred. For ontologies in the Description Logic $\mathcal{EL}^{++}$, several embedding methods have been developed that explicitly generate models of an ontology. However, these methods suffer from some limitations; they do not distinguish between statements that are unprovable and provably false, and therefore they may use entailed statements as negatives. Furthermore, they do not utilize the deductive closure of an ontology to identify statements that are inferred but not asserted. We evaluated a set of embedding methods for $\mathcal{EL}^{++}$ ontologies, incorporating several modifications that aim to make use of the ontology deductive closure. In particular, we designed novel negative losses that account both for the deductive closure and different types of negatives and formulated evaluation methods for knowledge base completion. We demonstrate that our embedding methods improve over the baseline ontology embedding in the task of knowledge base or ontology completion.