edge
ffbd6cbb019a1413183c8d08f2929307-Supplemental.pdf
The numbers of the lower and upper bounds in the binarization layer are both in{5,10,50}. We utilize the Adam (Kingma and Ba, 2014) method for the training process with a mini-batch size of 32. Onlargedata sets, RRL is trained for 100 epochs, and we decay the learning rate by a factor of 0.75 every 20 epochs. Theinverse of regularization strength is in {1, 4, 16, 32}. Figure 7 shows the scatter plots of F1 score against log(#edges) for rule-based models trained on the other ten data sets.
SparCL: Sparse Continual Learning on the Edge
Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning (SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.
EDGE: Explaining Deep Reinforcement Learning Policies
With the rapid development of deep reinforcement learning (DRL) techniques, there is an increasing need to understand and interpret DRL policies. While recent research has developed explanation methods to interpret how an agent determines its moves, they cannot capture the importance of actions/states to a game's final result. In this work, we propose a novel self-explainable model that augments a Gaussian process with a customized kernel function and an interpretable predictor. Together with the proposed model, we also develop a parameter learning procedure that leverages inducing points and variational inference to improve learning efficiency. Using our proposed model, we can predict an agent's final rewards from its game episodes and extract time step importance within episodes as strategy-level explanations for that agent. Through experiments on Atari and MuJoCo games, we verify the explanation fidelity of our method and demonstrate how to employ interpretation to understand agent behavior, discover policy vulnerabilities, remediate policy errors, and even defend against adversarial attacks.
- North America > United States > North Carolina (0.63)
- Africa > Nigeria (0.05)
- North America > United States > Washington (0.05)
- (2 more...)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law > Criminal Law (1.00)
- (3 more...)
Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization
We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison. We delve into the model's internals throughout training, conducting analytical experiments that reveal: 1) the mechanism behind grokking, such as the formation of the generalizing circuit and its relation to the relative efficiency of generalizing and memorizing circuits, and 2) the connection between systematicity and the configuration of the generalizing circuit. Our findings guide data and training setup to better induce implicit reasoning and suggest potential improvements to the transformer architecture, such as encouraging cross-layer knowledge sharing.
Review for NeurIPS paper: Graph Random Neural Networks for Semi-Supervised Learning on Graphs
Weaknesses: The proposed methods are not that novel. More specifically: (1) It seems that the consistency regularization is a general framework that can combine with other data augmentation methods, such as dropedge, and sampling algorithms. It would be better if the authors can also try these combinations, instead of only adopting their proposed dropnode augmentation. Thus, it would be better if the authors can provide a curve showing the performance of the proposed framework against other baselines under different training data percentage. Also, better to combine these methods with some advanced base GNN.
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.43)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.43)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
Reviews: Mean Field Residual Networks: On the Edge of Chaos
This paper analytically investigates the properties of Resnets with random weights using a mean field approximation. The approach used is an extension of previous analysis of feed forward neural networks. The authors show that in contrast to feed forward networks Resnets do exhibit a sub-exponential behavior (polynomial or logarithmic) when inputs are propagated forward or gradients are propagated backwards through the layers. The results are very interesting because they give an analytic justification and intuition of why Resnets with a large number of layers can be trained reliably. The paper also extends the mean field technique for studying neural network properties, which is of value for analyzing other architectures.
Microsoft rolls out its first business browser, Edge for Business
If you have a work PC, chances are you'll soon see the latest twist on Microsoft's browser: Microsoft Edge for Business, complete with (of course) AI. At its Microsoft Build developer conference this week, Microsoft will announce several tweaks to Edge, including its first dedicated business browser. Microsoft calls the new Edge the "standard browser experience for organizations," with its own separate logo and icon on your taskbar. You'll see more in the Edge sidebar, too: the inclusion of Microsoft 365 Copilot, web apps, and a new "Workspaces" collection of tabs that can be simultaneously shared and browsed with coworkers. Microsoft Edge already allows you to browse and open windows in both a personal and business account.
Microsoft's rolling out Edge's AI image generator to everyone - The Verge
In a Thursday blog post, Microsoft pitches the feature as a way to create "very specific" visuals when they're working on social media posts or slideshows and documents. While this has been possible in a variety of ways before -- you could use OpenAI's DALL-E, Microsoft's Bing image creator site, the built-in image generator in Bing Chat, or one of the many other image generators -- putting it right in Edge's sidebar makes it much easier to ask an AI to make you some pictures while you're doing something else on the web.
📝 📺 Edge#234: Inside Meta AI's Make-A-Video
On Thursdays, we dive deep into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI to complement the concepts we debate in other editions of our newsletter. Text-to-Video (T2V) is considered the next frontier for generative artificial intelligence (AI) models. While the text-to-image (T2I) space is experiencing a revolution with models like DALL-E, Stable Diffusion, and Midjouney, T2V still remains a monumental challenge. Recently, researchers from Meta AI unveiled Make-A-Video, a T2V model able to create realistic short video clips from textual inputs.