Goto

Collaborating Authors

 maxpool


Appendix: VariationalContinualBayesian Meta-Learning

Neural Information Processing Systems

In variational continual learning, the posterior distribution of interest is frequently intractable and approximation is required. We summarize the meta-training process of our VC-BML in algorithm 1. Moreover,we evaluate FTML onthe unseen tasks (i.e., tasks sampled from meta-test set) instead ofthe training tasksthattheoriginalFTMLused. It would be unfair to adopt the original initialization procedure in OSML. BOMVI [10]: In our experiments, we use variational inference to approximate the posterior of meta-parameters. E.3.2 Settings As the latent variables in this paper are meta-parameters and task-specific parameters, the dimensionality ofthelatent space isactually determined bythenumber ofparameters inthedeep neural network. In particular, we define a CNN architecture and present its details in Table 1.




75c58d36157505a600e0695ed0b3a22d-Supplemental.pdf

Neural Information Processing Systems

The current version of Predify assumes that there is no gap between the encoders. One can easily override the default setting by providing all the details for a PCoder. A.3 ExecutionTime Since we used a variable number of GPUs for the different experiments, an exact execution time is hard to pinpoint. We expect that this could be further improved with a more extensive and systematic hyperparameter search. In other words, their training hyperparameters appeared to have been optimised for their predictive coding network, but not - or not as much - for their feedforward baseline.



ONNX-Net: Towards Universal Representations and Instant Performance Prediction for Neural Architectures

Qin, Shiwen, Auras, Alexander, Cohen, Shay B., Crowley, Elliot J., Moeller, Michael, Ericsson, Linus, Lukasik, Jovita

arXiv.org Artificial Intelligence

Neural architecture search (NAS) automates the design process of high-performing architectures, but remains bottlenecked by expensive performance evaluation. Most existing studies that achieve faster evaluation are mostly tied to cell-based search spaces and graph encodings tailored to those individual search spaces, limiting their flexibility and scalability when applied to more expressive search spaces. In this work, we aim to close the gap of individual search space restrictions and search space dependent network representations. We present ONNX-Bench, a benchmark consisting of a collection of neural networks in a unified format based on ONNX files. ONNX-Bench includes all open-source NAS-bench-based neural networks, resulting in a total size of more than 600k {architecture, accuracy} pairs. This benchmark allows creating a shared neural network representation, ONNX-Net, able to represent any neural architecture using natural language descriptions acting as an input to a performance predictor. This text-based encoding can accommodate arbitrary layer types, operation parameters, and heterogeneous topologies, enabling a single surrogate to generalise across all neural architectures rather than being confined to cell-based search spaces. Experiments show strong zero-shot performance across disparate search spaces using only a small amount of pretraining samples, enabling the unprecedented ability to evaluate any neural network architecture instantly.




6e2713a6efee97bacb63e52c54f0ada0-Supplemental.pdf

Neural Information Processing Systems

We first derive the (6). In this section, we provide implementation details regarding the proposed method. Every experiment we report can be trained and tested on a single card.


Robust Noise Attenuation via Adaptive Pooling of Transformer Outputs

Brothers, Greyson

arXiv.org Artificial Intelligence

We investigate the design of pooling methods used to summarize the outputs of transformer embedding models, primarily motivated by reinforcement learning and vision applications. This work considers problems where a subset of the input vectors contains requisite information for a downstream task (signal) while the rest are distractors (noise). By framing pooling as vector quantization with the goal of minimizing signal loss, we demonstrate that the standard methods used to aggregate transformer outputs, AvgPool, MaxPool, and ClsToken, are vulnerable to performance collapse as the signal-to-noise ratio (SNR) of inputs fluctuates. We then show that an attention-based adaptive pooling method can approximate the signal-optimal vector quantizer within derived error bounds for any SNR. Our theoretical results are first validated by supervised experiments on a synthetic dataset designed to isolate the SNR problem, then generalized to standard relational reasoning, multi-agent reinforcement learning, and vision benchmarks with noisy observations, where transformers with adaptive pooling display superior robustness across tasks.