NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

NVIDIA, null, :, null, Basant, Aarti, Khairnar, Abhijit, Paithankar, Abhijit, Khattar, Abhinav, Renduchintala, Adithya, Malte, Aditya, Bercovich, Akhiad, Hazare, Akshay, Rico, Alejandra, Ficek, Aleksander, Kondratenko, Alex, Shaposhnikov, Alex, Bukharin, Alexander, Taghibakhshi, Ali, Barton, Amelia, Mahabaleshwarkar, Ameya Sunil, Shen, Amy, Tao, Andrew, Guan, Ann, Shors, Anna, Mandarwal, Anubhav, Mehta, Arham, Venkatesan, Arun, Sharabiani, Ashton, Aithal, Ashwath, Poojary, Ashwin, Dattagupta, Ayush, Buddharaju, Balaram, Zhu, Banghua, Simkin, Barnaby, Kartal, Bilal, Rouhani, Bita Darvish, Chen, Bobby, Ginsburg, Boris, Norick, Brandon, Yu, Brian, Catanzaro, Bryan, Wang, Charles, Truong, Charlie, Mungekar, Chetan, Patel, Chintan, Alexiuk, Chris, Munley, Christian, Parisien, Christopher, Su, Dan, Afrimi, Daniel, Korzekwa, Daniel, Rohrer, Daniel, Gitman, Daria, Mosallanezhad, David, Narayanan, Deepak, Rekesh, Dima, Yared, Dina, Pykhtar, Dmytro, Ahn, Dong, Riach, Duncan, Long, Eileen, Ning, Elliott, Chung, Eric, Galinkin, Erick, Bakhturina, Evelina, Prasad, Gargi, Shen, Gerald, Qian, Haifeng, Elisha, Haim, Sharma, Harsh, Ross, Hayley, Ngo, Helen, Sahota, Herman, Wang, Hexin, Shin, Hoo Chang, Huang, Hua, Cunningham, Iain, Gitman, Igor, Moshkov, Ivan, Jung, Jaehun, Kautz, Jan, Scowcroft, Jane Polak, Casper, Jared, Zhang, Jian, Zeng, Jiaqi, Zhang, Jimmy, Xue, Jinze, Huang, Jocelyn, Conway, Joey, Kamalu, John, Cohen, Jonathan, Jennings, Joseph, Vialard, Julien Veron, Yi, Junkeun, Parmar, Jupinder, Briski, Kari, Cheung, Katherine, Luna, Katherine, Wyss, Keith, Santhanam, Keshav, Kong, Kezhi, Pawelec, Krzysztof, Anik, Kumar, Li, Kunlun, Ahmadian, Kushan, McAfee, Lawrence, Sleiman, Laya, Derczynski, Leon, Vega, Luis, de Melo, Maer Rodrigues, Sreedhar, Makesh Narsimhan, Chochowski, Marcin, Cai, Mark, Kliegl, Markus, Stepniewska-Dziubinska, Marta, Novikov, Matvei, Samadi, Mehrzad, Price, Meredith, Boubdir, Meriem, Boone, Michael, Evans, Michael, Bien, Michal, Zawalski, Michal, Martinez, Miguel, Chrzanowski, Mike, Shoeybi, Mohammad, Patwary, Mostofa, Dhameja, Namit, Assaf, Nave, Habibi, Negar, Bhatia, Nidhi, Pope, Nikki, Tajbakhsh, Nima, Juluru, Nirmal Kumar, Rybakov, Oleg, Hrinchuk, Oleksii, Kuchaiev, Oleksii, Olabiyi, Oluwatobi, Ribalta, Pablo, Subramanian, Padmavathy, Chadha, Parth, Molchanov, Pavlo, Dykas, Peter, Jin, Peter, Bialecki, Piotr, Januszewski, Piotr, Thalasta, Pradeep, Gaikwad, Prashant, Varshney, Prasoon, Gundecha, Pritam, Tredak, Przemek, Mahabadi, Rabeeh Karimi, Patel, Rajen, El-Yaniv, Ran, Rajan, Ranjit, Cheruvu, Ria, Shahbazyan, Rima, Borkar, Ritika, Gala, Ritu, Waleffe, Roger, Zhang, Ruoxi, Hewett, Russell J., Prenger, Ryan, Jain, Sahil, Kriman, Samuel, Satheesh, Sanjeev, Kaji, Saori, Yurick, Sarah, Muralidharan, Saurav, Narenthiran, Sean, Bak, Seonmyeong, Sameni, Sepehr, Han, Seungju, Ramasamy, Shanmugam, Ghosh, Shaona, Sreenivas, Sharath Turuvekere, Thomas, Shelby, Diao, Shizhe, Gopal, Shreya, Prabhumoye, Shrimai, Toshniwal, Shubham, Ding, Shuoyang, Singh, Siddharth, Jain, Siddhartha, Majumdar, Somshubra, Singhal, Soumye, Alborghetti, Stefania, Akter, Syeda Nahida, Kong, Terry, Moon, Tim, Hliwiak, Tomasz, Asida, Tomer, Wang, Tony, Konuk, Tugrul, Vashishth, Twinkle, Poon, Tyler, Karpas, Udi, Noroozi, Vahid, Srinivasan, Venkat, Korthikanti, Vijay, Fugro, Vikram, Kalluru, Vineeth, Kurin, Vitaly, Lavrukhin, Vitaly, Ahmad, Wasi Uddin, Du, Wei, Byeon, Wonmin, Lu, Ximing, Dong, Xin, Karnati, Yashaswi, Choi, Yejin, Zhang, Yian, Lin, Ying, Fu, Yonggan, Suhara, Yoshi, Dong, Zhen, Li, Zhiyu, Zhu, Zhongbo, Chen, Zijia

arXiv.org Artificial Intelligence 

We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.