Calabria
Nvidia's 70 projects at ICLR show how raw chip power is central to AI's acceleration
One of the most important annual events in the field of artificial intelligence kicks off this week in Singapore: the International Conference on Learning Representations. As usual, chip giant Nvidia had a major presence at the conference, presenting over 70 research papers from its team. The papers cover topics ranging from generating music to creating 3D-realistic videos, robot training tasks, and the ability to generate multiple large language models at the push of a button. "People often think of Nvidia as a chip company that makes awesome chips, and of course, we're really proud of that," said Bryan Catanzaro, Nvidia's head of applied deep learning research, in an interview with ZDNET. "But the story that I think matters the most is that in order for us to make those awesome chips, we have to do research like this, because this teaches us how to make all of those systems."
Reviewer # 3 nonlinear SVMs, which are outside the class of fast, intricate algorithms considered in the paper (see lines 30-51), like TRON
We thank all the reviewers for their time. In what follows, reviewer comments are italicized and proceeded by our response in blue. We thank the reviewer for the helpful references. Importantly, we note that the SVM GPU-speedup paper by Catanzaro et al. is for Does that mean there is a trade-off between memory/computation and communication. Probably not appropriate to just report the speedup given the comparison is based on different platforms.
Why supercomputers are the unsung heroes of PC gaming
It's funny how things in reality can be so far removed from what we imagined. A classic example of this is how I imagined there to be a horde of scientists at Nvidia HQ hunched over their PCs and all working to train the next generation of Nvidia DLSS algorithms -- between enjoying bouts of Call of Duty with colleagues, of course. But as it turns out that's only part of the story… Yes, there are scientists at Nvidia working on these projects, but doing a large portion of the work in training and developing new DLSS technology for us PC gamers to enjoy is also an AI supercomputer, and it's been doing that non-stop 24/7 for going on six years now. That nugget of information was delivered by Brian Catanzaro, Nvidia's VP of applied deep learning research at CES 2025 in Las Vegas. Catanzaro dropped that gem on stage casually as a throwaway comment while discussing details about DLSS 4. But as it turns out, that reference has been the catalyst for a ton of talk about the topic.
EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts
Chaudhury, Subhajit, Das, Payel, Swaminathan, Sarathkrishna, Kollias, Georgios, Nelson, Elliot, Pahwa, Khushbu, Pedapati, Tejaswini, Melnyk, Igor, Riemer, Matthew
Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \textit{episodic attention} is then used to reweigh the decoder's self-attention to the stored KV cache of the context during training and generation. When an LLM decoder is trained using \textbf{EpMAN}, its performance on multiple challenging single-hop long-context recall and question-answering benchmarks is found to be stronger and more robust across the range from 16k to 256k tokens than baseline decoders trained with self-attention, and popular retrieval-augmented generation frameworks.
Reviewer # 3 nonlinear SVMs, which are outside the class of fast, intricate algorithms considered in the paper (see lines 30-51), like TRON
We thank all the reviewers for their time. In what follows, reviewer comments are italicized and proceeded by our response in blue. We thank the reviewer for the helpful references. Importantly, we note that the SVM GPU-speedup paper by Catanzaro et al. is for Does that mean there is a trade-off between memory/computation and communication. Probably not appropriate to just report the speedup given the comparison is based on different platforms.
Scaling Combinatorial Optimization Neural Improvement Heuristics with Online Search and Adaptation
Verdù, Federico Julian Camerota, Castelli, Lorenzo, Bortolussi, Luca
This approach (Singh and Rizwanullah 2022) to circuit board design eliminates the necessity for manually crafted components, (Barahona et al. 1988) and phylogenetics (Catanzaro thereby providing an ideal means to address problems without et al. 2012). Although general-purpose solvers exist and requiring specific domain knowledge (Lombardi and Milano most CO problems are easy to formulate, in many applications 2018). However, improvement heuristics can be easier of interest getting to the exact optimal solution is NPhard to apply when complex constraints need to be satisfied and and said solvers are extremely inefficient or even impractical may yield better performance than constructive alternatives due to the computational time required to reach optimality when the problem structure is difficult to represent (Zhang (Toth 2000; Colorni et al. 1996). Specialized solvers et al. 2020) or when known improvement operators with and heuristics have been developed over the years for different good properties exist (Bordewich et al. 2008).
DCAE-SR: Design of a Denoising Convolutional Autoencoder for reconstructing Electrocardiograms signals at Super Resolution
Lomoio, Ugo, Veltri, Pierangelo, Guzzi, Pietro Hiram, Lio', Pietro
Electrocardiogram (ECG) signals play a pivotal role in cardiovascular diagnostics, providing essential information on the electrical activity of the heart. However, the inherent noise and limited resolution in ECG recordings can hinder accurate interpretation and diagnosis. In this paper, we propose a novel model for ECG super resolution (SR) that uses a DNAE to enhance temporal and frequency information inside ECG signals. Our approach addresses the limitations of traditional ECG signal processing techniques. Our model takes in input 5-second length ECG windows sampled at 50 Hz (very low resolution) and it is able to reconstruct a denoised super-resolution signal with an x10 upsampling rate (sampled at 500 Hz). We trained the proposed DCAE-SR on public available myocardial infraction ECG signals. Our method demonstrates superior performance in reconstructing high-resolution ECG signals from very low-resolution signals with a sampling rate of 50 Hz. We compared our results with the current deep-learning literature approaches for ECG super-resolution and some non-deep learning reproducible methods that can perform both super-resolution and denoising. We obtained current state-of-the-art performances in super-resolution of very low resolution ECG signals frequently corrupted by ECG artifacts. We were able to obtain a signal-to-noise ratio of 12.20 dB (outperforms previous 4.68 dB), mean squared error of 0.0044 (outperforms previous 0.0154) and root mean squared error of 4.86% (outperforms previous 12.40%). In conclusion, our DCAE-SR model offers a robust (to artefact presence), versatile and explainable solution to enhance the quality of ECG signals. This advancement holds promise in advancing the field of cardiovascular diagnostics, paving the way for improved patient care and high-quality clinical decisions
Artificial Intelligence: The National Network of High Schools that want to include this specialty in their programs is born
The idea of including the topic of artificial intelligence in school curricula begins in the far north-east of Italy, specifically from the "Bunarrotti" secondary school in Monfalcone, where Dean Vincenzo Kaiko He also talked about creating a real network of schools that intend to offer educational courses to their students on this subject. Vincenzo Kaiko explains it Data science and artificial intelligence They are scientific disciplines closely related and related to other fields of knowledge such as mathematics, natural sciences, humanities, and economics, which together represent the most interesting frontier of new information and communication technologies. Integration of the study of data science and artificial intelligence into the high school track – Monfalcone School Principal adds It can allow male and female students to gain important basic knowledge in rapidly expanding fields of science and technology, both in terms of broadening their cultural background and in terms of orientation towards university studies. The study of these two disciplines also allows for logical development – mathematical skills, analytical and abstract skills, ability to solve problems and creativity, in an interdisciplinary and mutually enriching relationship both with mathematics, physics and the natural sciences, and with the humanistic disciplines." There are currently four Italian schools that have independently started secondary school curriculum studies with the aim of data science and artificial intelligence: these are Maserati High Schools in Foggera, Volta in Reggio Calabria and Galilei in Trento.
Amplify Partners' Sarah Catanzaro on the evolution of MLOps - RTInsights
Note: This interview was edited and condensed for clarity. As part of our media partnership with Tecton's apply(conf), RTInsights recently had the opportunity to speak with Sarah Catanzaro, General Partner at the venture firm Amplify Partners. The firm has invested in data startups OctoML, Einblick, Hex, among others. Prior to venture capital, she was the Head of Data at Mattermark. She started her career in counterterrorism.
News
NVIDIA opened the door for enterprises worldwide to develop and deploy large language models (LLM) by enabling them to build their own domain-specific chatbots, personal assistants and other AI applications that understand language with unprecedented levels of subtlety and nuance. The company unveiled the NVIDIA NeMo Megatron framework for training language models with trillions of parameters, the Megatron 530B customizable LLM that can be trained for new domains and languages, and NVIDIA Triton Inference Server with multi-GPU, multinode distributed inference functionality. Combined with NVIDIA DGX systems, these tools provide a production-ready, enterprise-grade solution to simplify the development and deployment of large language models. "Large language models have proven to be flexible and capable, able to answer deep domain questions, translate languages, comprehend and summarize documents, write stories and compute programs, all without specialized training or supervision," said Bryan Catanzaro, vice president of Applied Deep Learning Research at NVIDIA. "Building large language models for new languages and domains is likely the largest supercomputing application yet, and now these capabilities are within reach for the world's enterprises."