AITopics | rnn model

Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Multiplicative Integration with Recurrent Neural Networks

Neural Information Processing SystemsMar-17-2026, 12:07:34 GMT

We introduce a general simple structural design called "Multiplicative Integration" (MI) to improve recurrent neural networks (RNNs). MI changes the way of how the information flow gets integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

HitNet: Hybrid Ternary Recurrent Neural Network

Neural Information Processing SystemsMar-16-2026, 23:00:27 GMT

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Neural Information Processing SystemsFeb-15-2026, 20:09:36 GMT

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Add feedback

285b06e0dd856f20591b0a5beb954151-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:27:35 GMT

agent path integration, integration, path integration, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Greece > Attica > Athens (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

other comments in the paper if accepted

Neural Information Processing SystemsFeb-9-2026, 17:17:31 GMT

We appreciate the valuable comments from the reviewers. We will answer reviewers' questions from three aspects, i.e., In respond to Reviewer 5, this paper's major novelty is developing a new STL-based learning framework to Our method creates a practical way to ensure the logic rules' satisfaction in an end-to-end manner. Our approach achieves promising results on real city datasets, i.e., significantly We have carefully compared our work with all the related papers pointed out by the reviewers. Therefore, we also choose STL to express the model properties. Using STL to specify CPS properties is not our novelty.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

LearningtoExecuteProgramswith InstructionPointerAttentionGraphNeuralNetworks

Neural Information Processing SystemsFeb-8-2026, 15:45:09 GMT

Graph neural networks (GNNs) have emerged as a powerful tool for learning softwareengineering tasksincluding codecompletion, bugfinding,andprogram repair. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution.

artificial intelligence, branch decision, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

On Multiplicative Integration with Recurrent Neural Networks

Neural Information Processing SystemsNov-21-2025, 15:37:40 GMT

We introduce a general simple structural design called "Multiplicative Integration" (MI) to improve recurrent neural networks (RNNs). MI changes the way of how the information flow gets integrated in the computational building block of an RNN, while introducing almost no extra parameters. The new structure can be easily embedded into many popular RNN models, including LSTMs and GRUs. We empirically analyze its learning behaviour and conduct evaluations on several tasks using different RNN models. Our experimental results demonstrate that Multiplicative Integration can provide a substantial performance boost over many of the existing RNN models.

multiplicative integration, name change, recurrent neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

HitNet: Hybrid Ternary Recurrent Neural Network

Neural Information Processing SystemsNov-20-2025, 22:32:27 GMT

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model.

full precision model, hitnet, hybrid ternary recurrent neural network, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Neural Information Processing SystemsOct-10-2025, 06:04:48 GMT

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment.

arxiv preprint arxiv, dataset, language model, (13 more...)

Neural Information Processing Systems

Country: