edge
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning
Exploring unknown environments efficiently is a fundamental challenge in unsupervised goal-conditioned reinforcement learning. While selecting exploratory goals at the frontier of previously explored states is an effective strategy, the policy during training may still have limited capability of reaching rare goals on the frontier, resulting in reduced exploratory behavior. We propose "Cluster Edge Exploration" (CE 2), a new goal-directed exploration algorithm that when choosing goals in sparsely explored areas of the state space gives priority to goal states that remain accessible to the agent. The key idea is clustering to group states that are easily reachable from one another by the current policy under training in a latent space, and traversing to states holding significant exploration potential on the boundary of these clusters before doing exploratory behavior. In challenging robotics environments including navigating a maze with a multi-legged ant robot, manipulating objects with a robot arm on a cluttered tabletop, and rotating objects in the palm of an anthropomorphic robotic hand, CE 2 demonstrates superior efficiency in exploration compared to baseline methods and ablations.
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Curvature information -- particularly, the largest eigenvalue of the lossHessian, known as the sharpness -- often forms the basis for learning ratetuners. However, recent work has shown that the curvature information undergoescomplex dynamics during training, going from a phase of increasing sharpness toeventual stabilization. We analyze the closed-loop feedback effect betweenlearning rate tuning and curvature. We find that classical learning rate tunersmay yield greater one-step loss reduction, yet they ultimately underperform inthe long term when compared to constant learning rates in the full batch regime.These models break the stabilization of the sharpness, which we explain using asimplified model of the joint dynamics of the learning rate and the curvature.To further investigate these effects, we introduce a new learning rate tuningmethod, Curvature Dynamics Aware Tuning (CDAT), which prioritizes long termcurvature stabilization over instantaneous progress on the objective. In thefull batch regime, CDAT shows behavior akin to prefixed warm-up schedules on deeplearning objectives, outperforming tuned constant learning rates.
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Chen, Cong, Liu, Mingyu, Jing, Chenchen, Zhou, Yizhou, Rao, Fengyun, Chen, Hao, Zhang, Bo, Shen, Chunhua
This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. To tackle the challenge, we identify the current lack of a metric that finely measures the caption quality in concept level. We hereby introduce HalFscore, a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level. Additionally, we identify the root cause of hallucination as the model's over-reliance on its language prior. To address this, we propose PerturboLLaVA, which reduces the model's reliance on the language prior by incorporating adversarially perturbed text during training. This method enhances the model's focus on visual inputs, effectively reducing hallucinations and producing accurate, image-grounded descriptions without incurring additional computational overhead. PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations and achieving improved performance across general multimodal benchmarks.
- Asia > China (0.14)
- North America > United States > Texas (0.14)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Tennis (0.93)
- Transportation (0.70)
Review for NeurIPS paper: Graph Random Neural Networks for Semi-Supervised Learning on Graphs
Weaknesses: The proposed methods are not that novel. More specifically: (1) It seems that the consistency regularization is a general framework that can combine with other data augmentation methods, such as dropedge, and sampling algorithms. It would be better if the authors can also try these combinations, instead of only adopting their proposed dropnode augmentation. Thus, it would be better if the authors can provide a curve showing the performance of the proposed framework against other baselines under different training data percentage. Also, better to combine these methods with some advanced base GNN.
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.43)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.43)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
Position-aware Automatic Circuit Discovery
Haklay, Tal, Orgad, Hadas, Bau, David, Mueller, Aaron, Belinkov, Yonatan
A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware circuit discovery in datasets with variable length examples. We additionally develop an automated pipeline for schema generation and application using large language models. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
- Asia > Middle East > Israel (0.04)
- Europe > Monaco (0.04)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)
- Asia > Middle East > Jordan (0.04)
Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
Pierro, Alessandro, Abreu, Steven, Timcheck, Jonathan, Stratmann, Philipp, Wild, Andreas, Shrestha, Sumit Bam
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. These architectures hold promise for streaming applications at the edge, but deployment in resource-constrained environments requires hardware-aware optimizations to minimize latency and energy consumption. Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements--when accelerated by compatible hardware platforms. In this paper, we conduct a scaling study to investigate the Pareto front of performance and efficiency across inference compute budgets. We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines, with 2x less compute and 36% less memory at iso-accuracy. Our models achieve state-of-the-art results on a real-time streaming task for audio denoising. By quantizing our sparse models to fixed-point arithmetic and deploying them on the Intel Loihi 2 neuromorphic chip for real-time processing, we translate model compression into tangible gains of 42x lower latency and 149x lower energy consumption compared to a dense model on an edge GPU. Our findings showcase the transformative potential of unstructured sparsity, paving the way for highly efficient recurrent neural networks in real-world, resource-constrained environments.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (8 more...)
Few Edges Are Enough: Few-Shot Network Attack Detection with Graph Neural Networks
Bilot, Tristan, Madhoun, Nour El, Agha, Khaldoun Al, Zouaoui, Anis
Detecting cyberattacks using Graph Neural Networks (GNNs) has seen promising results recently. Most of the state-of-the-art models that leverage these techniques require labeled examples, hard to obtain in many real-world scenarios. To address this issue, unsupervised learning and Self-Supervised Learning (SSL) have emerged as interesting approaches to reduce the dependency on labeled data. Nonetheless, these methods tend to yield more anomalous detection algorithms rather than effective attack detection systems. This paper introduces Few Edges Are Enough (FEAE), a GNN-based architecture trained with SSL and Few-Shot Learning (FSL) to better distinguish between false positive anomalies and actual attacks. To maximize the potential of few-shot examples, our model employs a hybrid self-supervised objective that combines the advantages of contrastive-based and reconstruction-based SSL. By leveraging only a minimal number of labeled attack events, represented as attack edges, FEAE achieves competitive performance on two well-known network datasets compared to both supervised and unsupervised methods. Remarkably, our experimental results unveil that employing only 1 malicious event for each attack type in the dataset is sufficient to achieve substantial improvements. FEAE not only outperforms self-supervised GNN baselines but also surpasses some supervised approaches on one of the datasets.
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
ShadowGenes: Leveraging Recurring Patterns within Computational Graphs for Model Genealogy
Schulz, Kasimir, Evans, Kieran
Machine learning model genealogy enables practitioners to determine which architectural family a neural network belongs to. In this paper, we introduce ShadowGenes, a novel, signature-based method for identifying a given model's architecture, type, and family. Our method involves building a computational graph of the model that is agnostic of its serialization format, then analyzing its internal operations to identify unique patterns, and finally building and refining signatures based on these. We highlight important workings of the underlying engine and demonstrate the technique used to construct a signature and scan a given model. This approach to model genealogy can be applied to model files without the need for additional external information. We test ShadowGenes on a labeled dataset of over 1,400 models and achieve a mean true positive rate of 97.49% and a precision score of 99.51%; which validates the technique as a practical method for model genealogy. This enables practitioners to understand the use cases of a given model, the internal computational process, and identify possible security risks, such as the potential for model backdooring.
Microsoft confirms 365 Co-Pilot AI will be 'natively integrated' into Edge
There are vanishingly few places in Microsoft's business ecosystem that remain untouched by January's OpenAI deal, with GPT-4 backed chatbot and generative capabilities coming to Office products like Word and Excel, Bing Search, and integrated directly into the Edge browser. During the Microsoft Build 2023 conference on Tuesday, company executives clarified and confirmed that its 365 Copilot AI -- the same one going into Office -- will be "natively integrated" into the Edge browser. Microsoft 365 Copilot essentially takes all of your Graph information -- data from your Calendar, Word docs, emails and chat logs -- and smashes them together, using the informatic slurry in training an array of large language models, to provide AI-backed assistance personalized to your business. "You can type natural language requests like'Tell my team how we updated the product strategy today,'" Lindsay Kubasik, Group Product Manager, Edge Enterprise wrote in a Tuesday blog post. "Microsoft 365 Copilot will generate a status update based on the morning's meetings, emails and chat threads."