Problem-Independent Architectures
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
While Vision Transformer (ViT) have achieved success across various machine learning tasks, deploying them in real-world scenarios faces a critical challenge: generalizing under Out-of-Distribution (OoD) shifts. A crucial research gap remains in understanding how to design ViT architectures โ both manually and automatically โ to excel in OoD generalization. To address this gap, we introduce OoD-ViT-NAS, the first systematic benchmark for ViT Neural Architecture Search (NAS) focused on OoD generalization. This comprehensive benchmark includes 3,000 ViT architectures of varying model computational budgets evaluated on common large-scale OoD datasets. With this comprehensive benchmark at hand, we analyze the factors that contribute to the OoD generalization of ViT architecture. Firstly, we show that ViT architecture designs have a considerable impact on OoD generalization.
CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework
This work presents a novel approach to neural architecture search (NAS) that aims to increase carbon efficiency for the model design process. The proposed framework CE-NAS addresses the key challenge of high carbon cost associated with NAS by exploring the carbon emission variations of energy and energy differences of different NAS algorithms. At the high level, CE-NAS leverages a reinforcement-learning agent to dynamically adjust GPU resources based on carbon intensity, predicted by a time-series transformer, to balance energy-efficient sampling and energy-intensive evaluation tasks. Furthermore, CE-NAS leverages a recently proposed multi-objective optimizer to effectively reduce the NAS search space. We demonstrate the efficacy of CE-NAS in lowering carbon emissions while achieving SOTA results for both NAS datasets and open-domain NAS tasks. For example, on the HW-NasBench dataset, CE-NAS reduces carbon emissions by up to 7.22X while maintaining a search efficiency comparable to vanilla NAS.
AdanCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
Vision Transformers (ViTs) demonstrate remarkable performance in image classification through visual-token interaction learning, particularly when equipped with local information via region attention or convolutions. Although such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global visual-token representations through local interactions, with its training strategies and architecture design conferring strong generalization ability and robustness against noisy input. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome the large computational overhead of standard NCAs, we propose Dynamic Interaction for more efficient interaction learning.
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts
Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph Neural Network architecture that models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a referential representation w.r.t. a reference model, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure reliable optimization. GraphMETRO achieves state-of-the-art results on four datasets from the GOOD benchmark, which is comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on the WebKB and Twitch datasets.
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.
einspace: Searching for Neural Architectures from Fundamental Operations
Neural architecture search (NAS) finds high performing networks for a given task. Yet the results of NAS are fairly prosaic; they did not e.g. This is not least because the search spaces in NAS often aren't diverse enough to include such transformations a priori. Instead, for NAS to provide greater potential for fundamental design shifts, we need a novel expressive search space design which is built from more fundamental operations. To this end, we introduce einspace, a search space based on a parameterised probabilistic context-free grammar.
Review for NeurIPS paper: A Study on Encodings for Neural Architecture Search
This paper thoroughly studies both the empirical and theoretical aspects of what encodings to use for representing architectures for the task of predicting their downstream final performance. This step is fundamental to many NAS pipelines. This is a fundamental contribution to NAS literature and should become the go-to paper to read for others trying to design their own NAS pipeline. The authors are encouraged to incorporate all reviewer comments to further improve their paper.
Temporal Reasoning in AI systems
Commonsense temporal reasoning at scale is a core problem for cognitive systems. The correct inference of the duration for which fluents hold is required by many tasks, including natural language understanding and planning. Many AI systems have limited deductive closure because they cannot extrapolate information correctly regarding existing fluents and events. In this study, we discuss the knowledge representation and reasoning schemes required for robust temporal projection in the Cyc Knowledge Base. We discuss how events can start and end risk periods for fluents. We then use discrete survival functions, which represent knowledge of the persistence of facts, to extrapolate a given fluent. The extrapolated intervals can be truncated by temporal constraints and other types of commonsense knowledge. Finally, we present the results of experiments to demonstrate that these methods obtain significant improvements in terms of Q/A performance.
Reviews: Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection
The paper reads very well and manages to present both the challenges of NAS and the proposed idea in a very understandable form (although English grammar and spelling could be improved). The paper's main idea is to constrain the search space of NAS to the dilation factor of convolutions, such that the effective receptive field of units in the network can be varied, while keeping the network weights fixed (or at least allowing the weights to be re-used and smoothly varied during the optimization). This idea is very attractive from a computational point of view, since it allows the notoriously expensive NAS process to achieve faster progress by avoiding the need for ImageNet pre-training after every architecture change. On the flip side, the proposed NATS method only explores part of the potential search space of neural architecture variations. So, the longer-term effect will depend on how restrictive this choice of search space is.
Reviews: Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection
This paper proposes a neural architecture search method specifically for object detection tasks. Although the review scores were initially borderline, the feedback and the subsequent discussion swayed the reviewers into a jointly and consistently positive opinion of the paper. Although the concerns of R5 remain, even this reviewer agrees that they are not sufficient to criticise this work as a whole. I thus recommend to accept this paper.