Problem-Independent Architectures
AutomataGPT: Forecasting and Ruleset Inference for Two-Dimensional Cellular Automata
Berkovich, Jaime A., David, Noah S., Buehler, Markus J.
Cellular automata (CA) provide a minimal formalism for investigating how simple local interactions generate rich spatiotemporal behavior in domains as diverse as traffic flow, ecology, tissue morphogenesis and crystal growth. However, automatically discovering the local update rules for a given phenomenon and using them for quantitative prediction remains challenging. Here we present AutomataGPT, a decoder-only transformer pretrained on around 1 million simulated trajectories that span 100 distinct two-dimensional binary deterministic CA rules on toroidal grids. When evaluated on previously unseen rules drawn from the same CA family, AutomataGPT attains 98.5% perfect one-step forecasts and reconstructs the governing update rule with up to 96% functional (application) accuracy and 82% exact rule-matrix match. These results demonstrate that large-scale pretraining over wider regions of rule space yields substantial generalization in both the forward (state forecasting) and inverse (rule inference) problems, without hand-crafted priors. By showing that transformer models can faithfully infer and execute CA dynamics from data alone, our work lays the groundwork for abstracting real-world dynamical phenomena into data-efficient CA surrogates, opening avenues in biology, tissue engineering, physics and AI-driven scientific discovery.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- (6 more...)
- Law > Intellectual Property & Technology Law (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Government > Regional Government (0.46)
- Health & Medicine > Health Care Technology (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.86)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.67)
Differentiable Logic Cellular Automata: From Game of Life to Pattern Generation
Miotti, Pietro, Niklasson, Eyvind, Randazzo, Ettore, Mordvintsev, Alexander
This paper introduces Differentiable Logic Cellular Automata (DiffLogic CA), a novel combination of Neural Cellular Automata (NCA) and Differentiable Logic Gates Networks (DLGNs). The fundamental computation units of the model are differentiable logic gates, combined into a circuit. During training, the model is fully end-to-end differentiable allowing gradient-based training, and at inference time it operates in a fully discrete state space. This enables learning local update rules for cellular automata while preserving their inherent discrete nature. We demonstrate the versatility of our approach through a series of milestones: (1) fully learning the rules of Conway's Game of Life, (2) generating checkerboard patterns that exhibit resilience to noise and damage, (3) growing a lizard shape, and (4) multi-color pattern generation. Our model successfully learns recurrent circuits capable of generating desired target patterns. For simpler patterns, we observe success with both synchronous and asynchronous updates, demonstrating significant generalization capabilities and robustness to perturbations. We make the case that this combination of DLGNs and NCA represents a step toward programmable matter and robust computing systems that combine binary logic, neural network adaptability, and localized processing. This work, to the best of our knowledge, is the first successful application of differentiable logic gate networks in recurrent architectures.
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Reviews: Deep Active Learning with a Neural Architecture Search
This paper proposed a method for doing active learning (AL) where in each AL iteration the optimization is done over network architecture and the underlying parameters, as opposed to other methods which fixes the architecture and only optimizes the parameters. These two optimizations are done separately, by first performing a local search among models of monotonically increasing complexity and then optimizing parameters of the obtained architecture. The authors used this method with three different active learning algorithms and showed that their method improved performance of these ALs. The paper is very well-written and clear. The problem of architectural optimization is also of great importance in the field.
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.39)
Review for NeurIPS paper: CryptoNAS: Private Inference on a ReLU Budget
The authors argue that when using MiniONN, multiplication and addition are nearly free while ReLU operations are expensive; this is very different from inference on non-encrypted data, where multiply-adds tend to dominate the total runtime. They propose a combination of manual network modifications and Neural Architecture Search to find network architectures which have good tradeoffs between accuracy and number of ReLUs. The techniques are: 1) "ReLU Shuffling:" Manually changing the positions of certain ReLU layers so that ReLUs are applied to layers with fewer channels.
- Information Technology > Artificial Intelligence > Cognitive Science (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.41)
MOTE-NAS: Multi-Objective Training-based Estimate for Efficient Neural Architecture Search
Neural Architecture Search (NAS) methods seek effective optimization toward performance metrics regarding model accuracy and generalization while facing challenges regarding search costs and GPU resources. Recent Neural Tangent Kernel (NTK) NAS methods achieve remarkable search efficiency based on a training-free model estimate; however, they overlook the non-convex nature of the DNNs in the search process. In this paper, we develop Multi-Objective Training-based Estimate (MOTE) for efficient NAS, retaining search effectiveness and achieving the new state-of-the-art in the accuracy and cost trade-off. To improve NTK and inspired by the Training Speed Estimation (TSE) method, MOTE is designed to model the actual performance of DNNs from macro to micro perspective by draw loss landscape and convergence speed simultaneously. Using two reduction strategies, the MOTE is generated based on a reduced architecture and a reduced dataset.
- Information Technology > Artificial Intelligence > Cognitive Science (0.78)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
While Vision Transformer (ViT) have achieved success across various machine learning tasks, deploying them in real-world scenarios faces a critical challenge: generalizing under Out-of-Distribution (OoD) shifts. A crucial research gap remains in understanding how to design ViT architectures – both manually and automatically – to excel in OoD generalization. To address this gap, we introduce OoD-ViT-NAS, the first systematic benchmark for ViT Neural Architecture Search (NAS) focused on OoD generalization. This comprehensive benchmark includes 3,000 ViT architectures of varying model computational budgets evaluated on common large-scale OoD datasets. With this comprehensive benchmark at hand, we analyze the factors that contribute to the OoD generalization of ViT architecture. Firstly, we show that ViT architecture designs have a considerable impact on OoD generalization.
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.63)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)
CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework
This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS.
- Information Technology > Artificial Intelligence > Cognitive Science (0.97)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)
AdanCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
Vision Transformers (ViTs) demonstrate remarkable performance in image classification through visual-token interaction learning, particularly when equipped with local information via region attention or convolutions. Although such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global visual-token representations through local interactions, with its training strategies and architecture design conferring strong generalization ability and robustness against noisy input. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome the large computational overhead of standard NCAs, we propose Dynamic Interaction for more efficient interaction learning.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.89)
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.89)
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts
Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph Neural Network architecture that models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a referential representation w.r.t. a reference model, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure reliable optimization. GraphMETRO achieves state-of-the-art results on four datasets from the GOOD benchmark, which is comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on the WebKB and Twitch datasets.
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)