Goto

Collaborating Authors

 weight sharing



error is simply the

Neural Information Processing Systems

Figure (b) above shows that the performance is robust to different GCN embedding sizes. EA... degree to help": Figure (a) shows ablation study on NAS-Bench-201, which varies each component (surrogate The other experimental settings are the same as in Section 4.2. As can be seen, more accurate architectures are close to each other. BO typically works better in low-dimensional...": We use Here, in Figure (d) above, we use subnets that are sampled in the same search iteration. F or example, it is common to see pooling": Y es, we Thus, GCN propagation part is more important than how to add global node.


Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Neural Information Processing Systems

In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this universality, where $d$ in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only $\widetilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.





Privacy Preserving Charge Location Prediction for Electric Vehicles

Marlin, Robert, Jurdak, Raja, Abuadbba, Alsharif, Miller, Dimity

arXiv.org Artificial Intelligence

By 2050, electric vehicles (EVs) are projected to account for 70% of global vehicle sales. While EVs provide environmental benefits, they also pose challenges for energy generation, grid infrastructure, and data privacy. Current research on EV routing and charge management often overlooks privacy when predicting energy demands, leaving sensitive mobility data vulnerable. To address this, we developed a Federated Learning Transformer Network (FLTN) to predict EVs' next charge location with enhanced privacy measures. Each EV operates as a client, training an onboard FLTN model that shares only model weights, not raw data with a community-based Distributed Energy Resource Management System (DERMS), which aggregates them into a community global model. To further enhance privacy, non-transitory EVs use peer-to-peer weight sharing and augmentation within their community, obfuscating individual contributions and improving model accuracy. Community DERMS global model weights are then redistributed to EVs for continuous training. Our FLTN approach achieved up to 92% accuracy while preserving data privacy, compared to our baseline centralised model, which achieved 98% accuracy with no data privacy. Simulations conducted across diverse charge levels confirm the FLTN's ability to forecast energy demands over extended periods. We present a privacy-focused solution for forecasting EV charge location prediction, effectively mitigating data leakage risks.


Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Neural Information Processing Systems

In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of \mathcal{O}(\log d) suffices for deep CNNs to achieve this universality, where d in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only \widetilde{\mathcal{O}}(\log 2d) samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.


Head-wise Shareable Attention for Large Language Models

Cao, Zouying, Yang, Yifei, Zhao, Hai

arXiv.org Artificial Intelligence

Large Language Models (LLMs) suffer from huge number of parameters, which restricts their deployment on edge devices. Weight sharing is one promising solution that encourages weight reuse, effectively reducing memory usage with less performance drop. However, current weight sharing techniques primarily focus on small-scale models like BERT and employ coarse-grained sharing rules, e.g., layer-wise. This becomes limiting given the prevalence of LLMs and sharing an entire layer or block obviously diminishes the flexibility of weight sharing. In this paper, we present a perspective on head-wise shareable attention for large language models. We further propose two memory-efficient methods that share parameters across attention heads, with a specific focus on LLMs. Both of them use the same dynamic strategy to select the shared weight matrices. The first method directly reuses the pre-trained weights without retraining, denoted as $\textbf{DirectShare}$. The second method first post-trains with constraint on weight matrix similarity and then shares, denoted as $\textbf{PostShare}$. Experimental results reveal our head-wise shared models still maintain satisfactory capabilities, demonstrating the feasibility of fine-grained weight sharing applied to LLMs.


Reviews: Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures

Neural Information Processing Systems

The authors provide a clear and succinct introduction to the problems and approaches of biologically plausible forms of backprop in the brain. They argue for behavioural realism apart from physiological realism and undertake a detailed comparison of backprop versus difference target prop and its variants (some of which they newly propose) and also direct feedback alignment. In the end though, they find that all proposed forms of bio-plausible alternatives to backprop fall quite short on complex image recognition tasks. Despite the negative results, I find such a comparison very timely to consolidate results and push the community to search for better and more diverse alternatives. Overall I find the work impressive. The authors claim that weight sharing is not plausible in the brain.