Problem-Independent Architectures
Stochastic Variational Deep Kernel Learning
Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
Huang, Songtao, Zhao, Zhen, Li, Can, Bai, Lei
Real-world time series often have multiple frequency components that are intertwined with each other, making accurate time series forecasting challenging. Decomposing the mixed frequency components into multiple single frequency components is a natural choice. However, the information density of patterns varies across different frequencies, and employing a uniform modeling approach for different frequency components can lead to inaccurate characterization. To address this challenges, inspired by the flexibility of the recent Kolmogorov-Arnold Network (KAN), we propose a KAN-based Frequency Decomposition Learning architecture (TimeKAN) to address the complex forecasting challenges caused by multiple frequency mixtures. Specifically, TimeKAN mainly consists of three components: Cascaded Frequency Decomposition (CFD) blocks, Multi-order KAN Representation Learning (M-KAN) blocks and Frequency Mixing blocks. CFD blocks adopt a bottom-up cascading approach to obtain series representations for each frequency band. Benefiting from the high flexibility of KAN, we design a novel M-KAN block to learn and represent specific temporal patterns within each frequency band. Finally, Frequency Mixing blocks is used to recombine the frequency bands into the original format. Extensive experimental results across multiple real-world time series datasets demonstrate that TimeKAN achieves state-ofthe-art performance as an extremely lightweight architecture. Time series forecasting (TSF) has garnered significant interest due to its wide range of applications, including finance (Huang et al., 2024), energy management (Yin et al., 2023), traffic flow planning (Jiang & Luo, 2022), and weather forecasting (Lam et al., 2023).
Review for NeurIPS paper: Hierarchical Neural Architecture Search for Deep Stereo Matching
Weaknesses: - The paper is not particularly novel or exciting since it takes algorithms already applied in the field of semantic segmentation and applies them to the stereo depth estimation problem. The idea of using AutoML for stereo is not particularly novel either, as stated by the authors themselves, even if the proposed algorithm outperforms the previous proposal. Unfortunately the authors did not spend much time commenting on these aspects. For example, what might be the biggest takeaways from the found architecture? The main differences with respect to the previously published work is the search performed also on the network level and the use of two separate feature and matching networks.
Review for NeurIPS paper: Hierarchical Neural Architecture Search for Deep Stereo Matching
This paper initially received scores of 6,5,7, and 7. After the rebuttal R4 revised up from a 5 to a 6. The consensus from the reviewers was that while the technical novelty of the paper is not extremely high the results are important as neural architecture search for dense correspondence problems is under explored. Reviewers commented on the strong empirical performance for the same model across multiple datasets which is an important selling point for the paper. The authors are strongly encouraged to update the final paper to clarify the questions raised in the rebuttal - specifically the responses to R2's questions and the additional comparisons to AANet.
Review for NeurIPS paper: A Study on Encodings for Neural Architecture Search
Summary and Contributions: Post Rebuttal I thank the authors for taking the time to address my review and conducting more experiments. With the new experiments the paper became certainly stronger. Also apologies that I missed the additional experiments on Nasbench201 in the appendix. I increase my score (6- 7) and recommend acceptance of the paper. The paper studies the impact of various types of adjacency matrix and path encodings for neural network architectures, both theoretically and practically, and their effect on common sub-tasks of neural architecture search methods: random sampling, perturbation and training a predictor model.
Principles and Components of Federated Learning Architectures
Saif, Sarwar, Nasim, MD Abdullah Al, Biswas, Parag, Rashid, Abdur, Haque, MD Mahim Anjum, Jahangir, Md. Zihad Bin
Federated learning, also known as FL, is a machine learning framework in which a significant amount of clients (such as mobile devices or whole enterprises) collaborate to collaboratively train a model while keeping decentralized training data, all overseen by a central server (such as a service provider). There are advantages in terms of privacy, security, regulations, and economy with this decentralized approach to model training. FL is not impervious to the flaws that plague conventional machine learning models, despite its seeming promise. This study offers a thorough analysis of the fundamental ideas and elements of federated learning architectures, emphasizing five important areas: communication architectures, machine learning models, data partitioning, privacy methods, and system heterogeneity. We additionally address the difficulties and potential paths for future study in the area. Furthermore, based on a comprehensive review of the literature, we present a collection of architectural patterns for federated learning systems. This analysis will help to understand the basic of Federated learning, the primary components of FL, and also about several architectural details.
Optimizing Wealth by a Game within Cellular Automata
Hoffmann, Rolf, Seredyński, Franciszek, Désérable, Dominique
The objective is to find a Cellular Automata (CA) rule that can evolve 2D patterns that are optimal with respect to a global fitness function. The global fitness is defined as the sum of local computed utilities. A utility or value function computes a score depending on the states in the local neighborhood. First the method is explained that was followed to find such a CA rule. Then this method is applied to find a rule that maximizes social wealth. Here wealth is defined as the sum of the payoffs that all players (agents, cells) receive in a prisoner's dilemma game, and then shared equally among them. The problem is solved in four steps: (0) Defining the utility function, (1) Finding optimal master patterns with a Genetic Algorithm, (2) Extracting templates (local neighborhood configurations), (3) Inserting the templates in a general CA rule. The constructed CA rule finds optimal and near-optimal patterns for even and odd grid sizes. Optimal patterns of odd size contain exactly one singularity, a 2 x 2 block of cooperators.
Review for NeurIPS paper: Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
Weaknesses: The search space is not the same as the google publications but similar to once-for-all. The se-ratio is 0.25 in this paper's code, the expansion rates are {4,6} in this paper and the maximum depth is 5 in every stage, slightly different. Thus, please report #params in Tab. 1. L120. In this paper, the author uses 2K images as the validation set (L212) and use the validation loss to train the meta-network M. I'm curious that the author claim that this step is time-consuming (L159), then how many iterations in total are used for updating M in this paper? The Kendall rank is important, and I prefer more results.
Review for NeurIPS paper: CLEARER: Multi-Scale Neural Architecture Search for Image Restoration
Weaknesses: 1: Limited novelty: CLEARER uses multi-scale search space that consists of three types of modules: parallel module, transition module, and fusion module. All of these modules were originally proposed in [2, 1].The authors did not cite these works when mentioning the said modules throughout the paper. It seems inconvenient, as for every new task we would have a different architecture. However, they did not provide any analysis/insights of what makes it specific for image restoration. For instance, what makes it suitable for image denoising and image deraining, OR why it would not work for any other applications such as semantic segmentation?