Goto

Collaborating Authors

 edge


SparCL: Sparse Continual Learning on the Edge

Neural Information Processing Systems

Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning (SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.


EDGE: Explaining Deep Reinforcement Learning Policies

Neural Information Processing Systems

With the rapid development of deep reinforcement learning (DRL) techniques, there is an increasing need to understand and interpret DRL policies. While recent research has developed explanation methods to interpret how an agent determines its moves, they cannot capture the importance of actions/states to a game's final result. In this work, we propose a novel self-explainable model that augments a Gaussian process with a customized kernel function and an interpretable predictor. Together with the proposed model, we also develop a parameter learning procedure that leverages inducing points and variational inference to improve learning efficiency. Using our proposed model, we can predict an agent's final rewards from its game episodes and extract time step importance within episodes as strategy-level explanations for that agent. Through experiments on Atari and MuJoCo games, we verify the explanation fidelity of our method and demonstrate how to employ interpretation to understand agent behavior, discover policy vulnerabilities, remediate policy errors, and even defend against adversarial attacks.


North Carolina waterfront ambush 'highly premeditated,' suspect tied to anti-LGBTQ conspiracies: docs

FOX News

Nigel Max Edge allegedly opened fire from a boat at American Fish Company in Southport, North Carolina, killing three people and injuring five in what police call a premeditated attack.


Grokking of Implicit Reasoning in Transformers: A Mechanistic Journey to the Edge of Generalization

Neural Information Processing Systems

We study whether transformers can learn to implicitly reason over parametric knowledge, a skill that even the most capable language models struggle with. Focusing on two representative reasoning types, composition and comparison, we consistently find that transformers can learn implicit reasoning, but only through grokking, i.e., extended training far beyond overfitting. The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison. We delve into the model's internals throughout training, conducting analytical experiments that reveal: 1) the mechanism behind grokking, such as the formation of the generalizing circuit and its relation to the relative efficiency of generalizing and memorizing circuits, and 2) the connection between systematicity and the configuration of the generalizing circuit. Our findings guide data and training setup to better induce implicit reasoning and suggest potential improvements to the transformer architecture, such as encouraging cross-layer knowledge sharing.


PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Chen, Cong, Liu, Mingyu, Jing, Chenchen, Zhou, Yizhou, Rao, Fengyun, Chen, Hao, Zhang, Bo, Shen, Chunhua

arXiv.org Artificial Intelligence

This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. To tackle the challenge, we identify the current lack of a metric that finely measures the caption quality in concept level. We hereby introduce HalFscore, a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level. Additionally, we identify the root cause of hallucination as the model's over-reliance on its language prior. To address this, we propose PerturboLLaVA, which reduces the model's reliance on the language prior by incorporating adversarially perturbed text during training. This method enhances the model's focus on visual inputs, effectively reducing hallucinations and producing accurate, image-grounded descriptions without incurring additional computational overhead. PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations and achieving improved performance across general multimodal benchmarks.


Review for NeurIPS paper: Graph Random Neural Networks for Semi-Supervised Learning on Graphs

Neural Information Processing Systems

Weaknesses: The proposed methods are not that novel. More specifically: (1) It seems that the consistency regularization is a general framework that can combine with other data augmentation methods, such as dropedge, and sampling algorithms. It would be better if the authors can also try these combinations, instead of only adopting their proposed dropnode augmentation. Thus, it would be better if the authors can provide a curve showing the performance of the proposed framework against other baselines under different training data percentage. Also, better to combine these methods with some advanced base GNN.


Reviews: Mean Field Residual Networks: On the Edge of Chaos

Neural Information Processing Systems

This paper analytically investigates the properties of Resnets with random weights using a mean field approximation. The approach used is an extension of previous analysis of feed forward neural networks. The authors show that in contrast to feed forward networks Resnets do exhibit a sub-exponential behavior (polynomial or logarithmic) when inputs are propagated forward or gradients are propagated backwards through the layers. The results are very interesting because they give an analytic justification and intuition of why Resnets with a large number of layers can be trained reliably. The paper also extends the mean field technique for studying neural network properties, which is of value for analyzing other architectures.


Microsoft rolls out its first business browser, Edge for Business

PCWorld

If you have a work PC, chances are you'll soon see the latest twist on Microsoft's browser: Microsoft Edge for Business, complete with (of course) AI. At its Microsoft Build developer conference this week, Microsoft will announce several tweaks to Edge, including its first dedicated business browser. Microsoft calls the new Edge the "standard browser experience for organizations," with its own separate logo and icon on your taskbar. You'll see more in the Edge sidebar, too: the inclusion of Microsoft 365 Copilot, web apps, and a new "Workspaces" collection of tabs that can be simultaneously shared and browsed with coworkers. Microsoft Edge already allows you to browse and open windows in both a personal and business account.


Microsoft's rolling out Edge's AI image generator to everyone - The Verge

#artificialintelligence

In a Thursday blog post, Microsoft pitches the feature as a way to create "very specific" visuals when they're working on social media posts or slideshows and documents. While this has been possible in a variety of ways before -- you could use OpenAI's DALL-E, Microsoft's Bing image creator site, the built-in image generator in Bing Chat, or one of the many other image generators -- putting it right in Edge's sidebar makes it much easier to ask an AI to make you some pictures while you're doing something else on the web.


With Roadblocks Ahead, Will China Get an Edge in the Generative AI Race?

#artificialintelligence

As everyone knows the US and China are the main rivals in the AI race. The majority of the world's largest and most well-financed AI start-ups are located in the US and China, and the pace of investment, business expansion, and adoption does not appear to be declining any time soon. The study reveals by outlining a terrible scenario in which China would surpass the US in technological advancement. In such a scenario, China gets an edge in the generative AI race and generates revenue through the invention of cutting-edge technologies that it later employs as a tool of international political influence. The potential of ChatGPT to facilitate intelligent dialogues has replaced message tools from Stable Artificial Intelligence and Open-AI as the new object of desire throughout businesses. Companies, scholars, and entrepreneurs are exploring methods to enter the generative AI in China, where the country's IT industry has historically closely followed the West's new advancements.