Goto

Collaborating Authors

 mem


Collapsing Taylor Mode Automatic Differentiation

Neural Information Processing Systems

Computing partial differential equation (PDE) operators via nested backpropagation is expensive, yet popular, and severely restricts their utility for scientific machine learning. Recent advances, like the forward Laplacian and randomizing Taylor mode automatic differentiation (AD), propose forward schemes to address this. We introduce an optimization technique for Taylor mode that "collapses" derivatives by rewriting the computational graph, and demonstrate how to apply it to general linear PDE operators, and randomized Taylor mode. The modifications simply require propagating a sum up the computational graph, which could--or should-- be done by a machine learning compiler, without exposing complexity to users. We implement our collapsing procedure and evaluate it on popular PDE operators, confirming it accelerates Taylor mode and outperforms nested backpropagation.


Approximate Domain Unlearning for Vision-Language Models

Neural Information Processing Systems

Pre-trained Vision-Language Models (VLMs) exhibit strong generalization capabilities, enabling them to recognize a wide range of objects across diverse domains without additional training. However, they often retain irrelevant information beyond the requirements of specific target downstream tasks, raising concerns about computational efficiency and potential information leakage. This has motivated growing interest in approximate unlearning, which aims to selectively remove unnecessary knowledge while preserving overall model performance. Existing approaches to approximate unlearning have primarily focused on class unlearning, where a VLM is retrained to fail to recognize specified object classes while maintaining accuracy for others. However, merely forgetting object classes is often insufficient in practical applications.



Activation Map Compression through Tensor Decomposition for Deep Learning

Neural Information Processing Systems

The application of low-order decomposition results in considerable memory savings while preserving the features essential for learning, and also offers theoretical guarantees to convergence.



Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents

arXiv.org Artificial Intelligence

Large language models have advanced natural language understanding and generation, but their use as autonomous agents introduces architectural challenges for multi-step tasks. Existing frameworks often mix cognition, memory, and control in a single prompt, reducing coherence and predictability. The Structured Cognitive Loop (SCL) is proposed as an alternative architecture that separates these functions. In SCL, the language model handles cognition, memory is stored externally, and execution is guided by a lightweight controller within a goal-directed loop. This design allows intermediate results to be recorded and verified before actions are taken, improving traceability and evaluation. SCL is evaluated against prompt-based baselines such as ReAct and LangChain agents across three tasks: travel planning, conditional email drafting, and constraint-guided image generation. Under matched settings, SCL achieves an average task success rate of 86.3 percent, compared with 70.5 to 76.8 percent for baselines. It also shows higher goal fidelity, fewer redundant calls, and reduced unsupported assertions. These results indicate that separating cognition, memory, and control can enhance reliability and interpretability without relying on larger models or heavier prompts. The findings should be regarded as preliminary evidence, with broader tests across model families and task domains planned for future work.


Activation Map Compression through Tensor Decomposition for Deep Learning

Neural Information Processing Systems

The application of low-order decomposition results in considerable memory savings while preserving the features essential for learning, and also offers theoretical guarantees to convergence.


LLMs on a Budget? Say HOLA

arXiv.org Artificial Intelligence

Running Large Language Models (LLMs) on edge devices is constrained by high compute and memory demands posing a barrier for real-time applications in sectors like healthcare, education, and embedded systems. Current solutions such as quantization, pruning, and retrieval-augmented generation (RAG) offer only partial optimizations and often compromise on speed or accuracy. We introduce HOLA, an end-to-end optimization framework for efficient LLM deployment. Internally, it leverages Hierarchical Speculative Decoding (HSD) for faster inference without quality loss. Externally, AdaComp-RAG adjusts retrieval complexity based on context needs. Together with LoBi, which blends structured pruning (LoRA) and quantization, HOLA delivers significant gains: 17.6% EMA on GSM8K, 10.5% MCA on ARC, and reduced latency and memory on edge devices like Jetson Nano--proving both scalable and production-ready.


Supplemental Material for What Neural Networks Memorize and Why A Proof of Lemma 2.1

Neural Information Processing Systems

We now compute the expected squared error of each of the terms of this estimator. In both cases the squared error is at most 1 / 4 . We implement our algorithms with Tensorflow [1]. Our implementation achieves 73% top-1 accuracy when trained on the full training set. For DenseNet, we halved the batch size and learning rate due to higher memory load of the architecture.


Memorization Sinks: Isolating Memorization during LLM Training

arXiv.org Artificial Intelligence

Large language models are susceptible to memorizing repeated sequences, posing privacy and copyright concerns. A popular mitigation strategy is to remove memorized information from specific neurons post-hoc. However, such approaches have shown limited success so far. In a controlled setting, we show that the memorization of natural sequences (those that resemble linguistically plausible text) become mechanistically entangled with general language abilities, thereby becoming challenging to remove post-hoc. In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. We leverage a sequence identifier that activates a unique set of memorization neurons for each sequence across repetitions. By analyzing the dynamics of learning and forgetting, we argue that MemSinks facilitates isolation of memorized content, making it easier to remove without compromising general language capabilities. We implement MemSinks at the billion-parameter and billion-token scale, and observe both effective isolation and strong generalization. To our knowledge, this is the first proof-of-concept on real data demonstrating that simultaneous generalization and isolation is achievable. We open-source our code at http://github.com/grghosal/MemSinks.