Goto

Collaborating Authors

 Problem-Independent Architectures


Stochastic Variational Deep Kernel Learning

Neural Information Processing Systems

Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.


Unsupervised Graph Neural Architecture Search with Disentangled Self-supervision

Neural Information Processing Systems

The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed DSGAS model is able to achieve state-ofthe-art performance against several baseline methods in an unsupervised manner.


Neural Architecture Dilation for Adversarial Robustness (Supplementary Material) Yanxi Li

Neural Information Processing Systems

For the dilation architecture, we use a DAG with 4 nodes as the supernetwork. There are 8 operation candidates for each edges, including 4 convolutional operations: 3 3 separable convolutions, 5 5 separable convolutions, 3 3 dilated separable convolutions and 5 5 dilated separable convolutions, 2 pooling operations: 3 3 average pooling and 3 3 max pooling, and two special operations: an identity operation representing skip-connection and a zero operation representing two nodes are not connected. During dilating, we stack 3 cells for each of the 3 blocks in the WRN34-10. During retraining, the number is increased to 6. The dilated architectures designed by NADAR are as shown in Figure 1.


TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting

arXiv.org Artificial Intelligence

Real-world time series often have multiple frequency components that are intertwined with each other, making accurate time series forecasting challenging. Decomposing the mixed frequency components into multiple single frequency components is a natural choice. However, the information density of patterns varies across different frequencies, and employing a uniform modeling approach for different frequency components can lead to inaccurate characterization. To address this challenges, inspired by the flexibility of the recent Kolmogorov-Arnold Network (KAN), we propose a KAN-based Frequency Decomposition Learning architecture (TimeKAN) to address the complex forecasting challenges caused by multiple frequency mixtures. Specifically, TimeKAN mainly consists of three components: Cascaded Frequency Decomposition (CFD) blocks, Multi-order KAN Representation Learning (M-KAN) blocks and Frequency Mixing blocks. CFD blocks adopt a bottom-up cascading approach to obtain series representations for each frequency band. Benefiting from the high flexibility of KAN, we design a novel M-KAN block to learn and represent specific temporal patterns within each frequency band. Finally, Frequency Mixing blocks is used to recombine the frequency bands into the original format. Extensive experimental results across multiple real-world time series datasets demonstrate that TimeKAN achieves state-ofthe-art performance as an extremely lightweight architecture. Time series forecasting (TSF) has garnered significant interest due to its wide range of applications, including finance (Huang et al., 2024), energy management (Yin et al., 2023), traffic flow planning (Jiang & Luo, 2022), and weather forecasting (Lam et al., 2023).


Review for NeurIPS paper: Hierarchical Neural Architecture Search for Deep Stereo Matching

Neural Information Processing Systems

Weaknesses: - The paper is not particularly novel or exciting since it takes algorithms already applied in the field of semantic segmentation and applies them to the stereo depth estimation problem. The idea of using AutoML for stereo is not particularly novel either, as stated by the authors themselves, even if the proposed algorithm outperforms the previous proposal. Unfortunately the authors did not spend much time commenting on these aspects. For example, what might be the biggest takeaways from the found architecture? The main differences with respect to the previously published work is the search performed also on the network level and the use of two separate feature and matching networks.


Review for NeurIPS paper: Hierarchical Neural Architecture Search for Deep Stereo Matching

Neural Information Processing Systems

This paper initially received scores of 6,5,7, and 7. After the rebuttal R4 revised up from a 5 to a 6. The consensus from the reviewers was that while the technical novelty of the paper is not extremely high the results are important as neural architecture search for dense correspondence problems is under explored. Reviewers commented on the strong empirical performance for the same model across multiple datasets which is an important selling point for the paper. The authors are strongly encouraged to update the final paper to clarify the questions raised in the rebuttal - specifically the responses to R2's questions and the additional comparisons to AANet.


Review for NeurIPS paper: A Study on Encodings for Neural Architecture Search

Neural Information Processing Systems

Summary and Contributions: Post Rebuttal I thank the authors for taking the time to address my review and conducting more experiments. With the new experiments the paper became certainly stronger. Also apologies that I missed the additional experiments on Nasbench201 in the appendix. I increase my score (6- 7) and recommend acceptance of the paper. The paper studies the impact of various types of adjacency matrix and path encodings for neural network architectures, both theoretically and practically, and their effect on common sub-tasks of neural architecture search methods: random sampling, perturbation and training a predictor model.


Principles and Components of Federated Learning Architectures

arXiv.org Artificial Intelligence

Federated learning, also known as FL, is a machine learning framework in which a significant amount of clients (such as mobile devices or whole enterprises) collaborate to collaboratively train a model while keeping decentralized training data, all overseen by a central server (such as a service provider). There are advantages in terms of privacy, security, regulations, and economy with this decentralized approach to model training. FL is not impervious to the flaws that plague conventional machine learning models, despite its seeming promise. This study offers a thorough analysis of the fundamental ideas and elements of federated learning architectures, emphasizing five important areas: communication architectures, machine learning models, data partitioning, privacy methods, and system heterogeneity. We additionally address the difficulties and potential paths for future study in the area. Furthermore, based on a comprehensive review of the literature, we present a collection of architectural patterns for federated learning systems. This analysis will help to understand the basic of Federated learning, the primary components of FL, and also about several architectural details.


Review for NeurIPS paper: Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Neural Information Processing Systems

Weaknesses: The search space is not the same as the google publications but similar to once-for-all. The se-ratio is 0.25 in this paper's code, the expansion rates are {4,6} in this paper and the maximum depth is 5 in every stage, slightly different. Thus, please report #params in Tab. 1. L120. In this paper, the author uses 2K images as the validation set (L212) and use the validation loss to train the meta-network M. I'm curious that the author claim that this step is time-consuming (L159), then how many iterations in total are used for updating M in this paper? The Kendall rank is important, and I prefer more results.