AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.77)

Neural Information Processing SystemsFeb-18-2026, 14:19:10 GMT

Activation Map Compression through Tensor Decomposition for Deep Learning

The application of low-order decomposition results in considerable memory savings while preserving the features essential for learning, and also offers theoretical guarantees to convergence.

artificial intelligence, hosvd, machine learning, (16 more...)

Country:

Europe > France (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-7-2026, 17:34:53 GMT

Supplemental Materialfor " What Neural Networks Memorizeand Why " AProofof Lemma 2.1 Proof.Observethat, bylinearityofexpectation, wehave: inflm(A,S, i,z) = E

artificial intelligence, machine learning, mem, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

arXiv.org Artificial IntelligenceDec-1-2025

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents

Kim, Myung Ho

Large language models have advanced natural language understanding and generation, but their use as autonomous agents introduces architectural challenges for multi-step tasks. Existing frameworks often mix cognition, memory, and control in a single prompt, reducing coherence and predictability. The Structured Cognitive Loop (SCL) is proposed as an alternative architecture that separates these functions. In SCL, the language model handles cognition, memory is stored externally, and execution is guided by a lightweight controller within a goal-directed loop. This design allows intermediate results to be recorded and verified before actions are taken, improving traceability and evaluation. SCL is evaluated against prompt-based baselines such as ReAct and LangChain agents across three tasks: travel planning, conditional email drafting, and constraint-guided image generation. Under matched settings, SCL achieves an average task success rate of 86.3 percent, compared with 70.5 to 76.8 percent for baselines. It also shows higher goal fidelity, fewer redundant calls, and reduced unsupported assertions. These results indicate that separating cognition, memory, and control can enhance reliability and interpretability without relying on larger models or heavier prompts. The findings should be regarded as preliminary evidence, with broader tests across model families and task domains planned for future work.

artificial intelligence, large language model, natural language, (16 more...)

2510.05107

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Consumer Products & Services > Travel (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsOct-10-2025, 20:28:03 GMT

Activation Map Compression through Tensor Decomposition for Deep Learning

The application of low-order decomposition results in considerable memory savings while preserving the features essential for learning, and also offers theoretical guarantees to convergence.

decomposition, hosvd, neural network, (14 more...)

Country:

Europe > France (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Siddiqui, Zohaib Hasan, Gao, Jiechao, Shabbir, Ebad, Azeez, Mohammad Anas, Ali, Rafiq, Kashyap, Gautam Siddharth, Naseem, Usman

LLMs on a Budget? Say HOLA

arXiv.org Artificial IntelligenceOct-10-2025

Running Large Language Models (LLMs) on edge devices is constrained by high compute and memory demands posing a barrier for real-time applications in sectors like healthcare, education, and embedded systems. Current solutions such as quantization, pruning, and retrieval-augmented generation (RAG) offer only partial optimizations and often compromise on speed or accuracy. We introduce HOLA, an end-to-end optimization framework for efficient LLM deployment. Internally, it leverages Hierarchical Speculative Decoding (HSD) for faster inference without quality loss. Externally, AdaComp-RAG adjusts retrieval complexity based on context needs. Together with LoBi, which blends structured pruning (LoRA) and quantization, HOLA delivers significant gains: 17.6% EMA on GSM8K, 10.5% MCA on ARC, and reduced latency and memory on edge devices like Jetson Nano--proving both scalable and production-ready.

large language model, machine learning, natural language, (21 more...)

2506.18952

Country: Asia > India > NCT (0.14)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 09:25:37 GMT

Supplemental Material for What Neural Networks Memorize and Why A Proof of Lemma 2.1

We now compute the expected squared error of each of the terms of this estimator. In both cases the squared error is at most 1 / 4 . We implement our algorithms with Tensorflow [1]. Our implementation achieves 73% top-1 accuracy when trained on the full training set. For DenseNet, we halved the batch size and learning rate due to higher memory load of the architecture.

architecture, artificial intelligence, machine learning, (19 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Ghosal, Gaurav R., Maini, Pratyush, Raghunathan, Aditi

Memorization Sinks: Isolating Memorization during LLM Training

arXiv.org Artificial IntelligenceSep-17-2025

Large language models are susceptible to memorizing repeated sequences, posing privacy and copyright concerns. A popular mitigation strategy is to remove memorized information from specific neurons post-hoc. However, such approaches have shown limited success so far. In a controlled setting, we show that the memorization of natural sequences (those that resemble linguistically plausible text) become mechanistically entangled with general language abilities, thereby becoming challenging to remove post-hoc. In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. We leverage a sequence identifier that activates a unique set of memorization neurons for each sequence across repetitions. By analyzing the dynamics of learning and forgetting, we argue that MemSinks facilitates isolation of memorized content, making it easier to remove without compromising general language capabilities. We implement MemSinks at the billion-parameter and billion-token scale, and observe both effective isolation and strong generalization. To our knowledge, this is the first proof-of-concept on real data demonstrating that simultaneous generalization and isolation is achievable. We open-source our code at http://github.com/grghosal/MemSinks.

artificial intelligence, machine learning, natural language, (18 more...)

2507.09937

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

arXiv.org Artificial IntelligenceMay-28-2025

PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment

Yu, Qi, Zeng, Zhichen, Yan, Yuchen, Liu, Zhining, Jing, Baoyu, Qiu, Ruizhong, Azad, Ariful, Tong, Hanghang

Network alignment (NA) aims to identify node correspondence across different networks and serves as a critical cornerstone behind various downstream multi-network learning tasks. Despite growing research in NA, there lacks a comprehensive library that facilitates the systematic development and benchmarking of NA methods. In this work, we introduce PLANETALIGN, a comprehensive Python library for network alignment that features a rich collection of built-in datasets, methods, and evaluation pipelines with easy-to-use APIs. Specifically, PLANETALIGN integrates 18 datasets and 14 NA methods with extensible APIs for easy use and development of NA methods. Our standardized evaluation pipeline encompasses a wide range of metrics, enabling a systematic assessment of the effectiveness, scalability, and robustness of NA methods. Through extensive comparative studies, we reveal practical insights into the strengths and limitations of existing NA methods. We hope that PLANETALIGN can foster a deeper understanding of the NA problem and facilitate the development and benchmarking of more effective, scalable, and robust methods in the future. The source code of PLANETALIGN is available at https://github.com/yq-leo/PlanetAlign.

data mining, machine learning, programming language, (20 more...)

2505.21366

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Islam, Md Arman, Jonnala, Devi Varaprasad, Rekhi, Ritika, Pokharel, Pratik, Cilamkoti, Siddharth, Imran, Asif, Kosar, Tevfik, Turkkan, Bekir

Evaluating the Energy-Efficiency of the Code Generated by LLMs

arXiv.org Artificial IntelligenceMay-28-2025

As the quality of code generated by Large Language Models (LLMs) improves, their adoption in the software industry for automated code generation continues to grow. Researchers primarily focus on enhancing the functional correctness of the generated code while commonly overlooking its energy efficiency and environmental impact. This paper investigates the energy efficiency of the code generated by 20 popular LLMs for 878 programming problems of varying difficulty levels and diverse algorithmic categories selected from the LeetCode platform by comparing them against canonical human-written solutions. Although LLMs can produce functionally correct results in most cases, our findings show that the performance and energy efficiency of LLM-produced solutions are often far below those of human-written solutions. Among the studied LLMs, DeepSeek-v3 and GPT-4o generate the most energy-efficient code, whereas Grok-2 and Gemini-1.5-Pro are among the least energy-efficient models. On average, human-generated canonical solutions are approximately 1.17 times more energy efficient than DeepSeek-v3, 1.21 times more energy efficient than GPT-4o, and over 2 times more energy efficient than Grok-2 and Gemini-1.5-Pro. For specific algorithmic groups such as dynamic programming, backtracking, and bit manipulation, LLM-generated code can consume up to 450 times more energy than human-generated canonical solutions.

large language model, llama 3, machine learning, (19 more...)

2505.20324

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Energy (0.95)
Information Technology > Software (0.48)
Law > Environmental Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)