Goto

Collaborating Authors

 Problem-Independent Architectures


Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

Neural Information Processing Systems

Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.


GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts

Neural Information Processing Systems

Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph Neural Network architecture that models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a referential representation w.r.t. a reference model, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure reliable optimization. GraphMETRO achieves state-of-the-art results on four datasets from the GOOD benchmark, which is comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on the WebKB and Twitch datasets.


einspace: Searching for Neural Architectures from Fundamental Operations

Neural Information Processing Systems

Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar.


A Path to Universal Neural Cellular Automata

arXiv.org Artificial Intelligence

Cellular automata have long been celebrated for their ability to generate complex behaviors from simple, local rules, with well-known discrete models like Conway's Game of Life proven capable of universal computation. Recent advancements have extended cellular automata into continuous domains, raising the question of whether these systems retain the capacity for universal computation. In parallel, neural cellular automata have emerged as a powerful paradigm where rules are learned via gradient descent rather than manually designed. This work explores the potential of neural cellular automata to develop a continuous Universal Cellular Automaton through training by gradient descent. We introduce a cellular automaton model, objective functions and training strategies to guide neural cellular automata toward universal computation in a continuous setting. Our experiments demonstrate the successful training of fundamental computational primitives - such as matrix multiplication and transposition - culminating in the emulation of a neural network solving the MNIST digit classification task directly within the cellular automata state. These results represent a foundational step toward realizing analog general-purpose computers, with implications for understanding universal computation in continuous dynamics and advancing the automated discovery of complex cellular automata behaviors via machine learning.


EngramNCA: a Neural Cellular Automaton Model of Memory Transfer

arXiv.org Artificial Intelligence

This study introduces EngramNCA, a neural cellular automaton (NCA) that integrates both publicly visible states and private, cell-internal memory channels, drawing inspiration from emerging biological evidence suggesting that memory storage extends beyond synaptic modifications to include intracellular mechanisms. The proposed model comprises two components: GeneCA, an NCA trained to develop distinct morphologies from seed cells containing immutable "gene" encodings, and GenePropCA, an auxiliary NCA that modulates the private "genetic" memory of cells without altering their visible states. This architecture enables the encoding and propagation of complex morphologies through the interaction of visible and private channels, facilitating the growth of diverse structures from a shared "genetic" substrate. EngramNCA supports the emergence of hierarchical and coexisting morphologies, offering insights into decentralized memory storage and transfer in artificial systems. These findings have potential implications for the development of adaptive, self-organizing systems and may contribute to the broader understanding of memory mechanisms in both biological and synthetic contexts. Data/Code: A web version of this article with videos is available here, while the Github repository is available here and the code is available on Colab here. Images that represent videos are hyperlinked to their respective video in the web version.


Cellular Automaton With CNN

arXiv.org Artificial Intelligence

Cellular automata (CA) models are widely used to simulate complex systems with emergent behaviors, but identifying hidden parameters that govern their dynamics remains a significant challenge. This study explores the use of Convolutional Neural Networks (CNN) to identify jump parameters in a two-dimensional CA model. We propose a custom CNN architecture trained on CA-generated data to classify jump parameters, which dictates the neighborhood size and movement rules of cells within the CA. Experiments were conducted across varying domain sizes (25 x 25 to 150 x 150) and CA iterations (0 to 50), demonstrating that the accuracy improves with larger domain sizes, as they provide more spatial information for parameter estimation. Interestingly, while initial CA iterations enhance the performance, increasing the number of iterations beyond a certain threshold does not significantly improve accuracy, suggesting that only specific temporal information is relevant for parameter identification. The proposed CNN achieves competitive accuracy (89.31) compared to established architectures like LeNet-5 and AlexNet, while offering significantly faster inference times, making it suitable for real-time applications. This study highlights the potential of CNNs as a powerful tool for fast and accurate parameter estimation in CA models, paving the way for their use in more complex systems and higher-dimensional domains. Future work will explore the identification of multiple hidden parameters and extend the approach to three-dimensional CA models.


Review for NeurIPS paper: A Study on Encodings for Neural Architecture Search

Neural Information Processing Systems

This paper thoroughly studies both the empirical and theoretical aspects of what encodings to use for representing architectures for the task of predicting their downstream final performance. This step is fundamental to many NAS pipelines. This is a fundamental contribution to NAS literature and should become the go-to paper to read for others trying to design their own NAS pipeline. The authors are encouraged to incorporate all reviewer comments to further improve their paper.


Temporal Reasoning in AI systems

arXiv.org Artificial Intelligence

Commonsense temporal reasoning at scale is a core problem for cognitive systems. The correct inference of the duration for which fluents hold is required by many tasks, including natural language understanding and planning. Many AI systems have limited deductive closure because they cannot extrapolate information correctly regarding existing fluents and events. In this study, we discuss the knowledge representation and reasoning schemes required for robust temporal projection in the Cyc Knowledge Base. We discuss how events can start and end risk periods for fluents. We then use discrete survival functions, which represent knowledge of the persistence of facts, to extrapolate a given fluent. The extrapolated intervals can be truncated by temporal constraints and other types of commonsense knowledge. Finally, we present the results of experiments to demonstrate that these methods obtain significant improvements in terms of Q/A performance.


Reviews: Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection

Neural Information Processing Systems

The paper reads very well and manages to present both the challenges of NAS and the proposed idea in a very understandable form (although English grammar and spelling could be improved). The paper's main idea is to constrain the search space of NAS to the dilation factor of convolutions, such that the effective receptive field of units in the network can be varied, while keeping the network weights fixed (or at least allowing the weights to be re-used and smoothly varied during the optimization). This idea is very attractive from a computational point of view, since it allows the notoriously expensive NAS process to achieve faster progress by avoiding the need for ImageNet pre-training after every architecture change. On the flip side, the proposed NATS method only explores part of the potential search space of neural architecture variations. So, the longer-term effect will depend on how restrictive this choice of search space is.


Reviews: Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection

Neural Information Processing Systems

This paper proposes a neural architecture search method specifically for object detection tasks. Although the review scores were initially borderline, the feedback and the subsequent discussion swayed the reviewers into a jointly and consistently positive opinion of the paper. Although the concerns of R5 remain, even this reviewer agrees that they are not sufficient to criticise this work as a whole. I thus recommend to accept this paper.