Veeravalli, Bharadwaj
SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation
Wu, Qilong, Xiang, Xiaoneng, Huang, Hejia, Wang, Xuan, Jie, Yeo Wei, Satapathy, Ranjan, Filho, Ricardo Shirota, Veeravalli, Bharadwaj
The rapid growth of the financial sector and the rising focus on Environmental, Social, and Governance (ESG) considerations highlight the need for advanced NLP tools. However, open-source LLMs proficient in both finance and ESG domains remain scarce. To address this gap, we introduce SusGen-30K, a category-balanced dataset comprising seven financial NLP tasks and ESG report generation, and propose TCFD-Bench, a benchmark for evaluating sustainability report generation. Leveraging this dataset, we developed SusGen-GPT, a suite of models achieving state-of-the-art performance across six adapted and two off-the-shelf tasks, trailing GPT-4 by only 2% despite using 7-8B parameters compared to GPT-4's 1,700B. Based on this, we propose the SusGen system, integrated with Retrieval-Augmented Generation (RAG), to assist in sustainability report generation. This work demonstrates the efficiency of our approach, advancing research in finance and ESG.
LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
Jiao, Siwen, Fang, Yangyi, Peng, Baoyun, Chen, Wangqun, Veeravalli, Bharadwaj
Recent advancements in Visual Language Models (VLMs) have made them crucial for visual question answering (VQA) in autonomous driving, enabling natural human-vehicle interactions. However, existing methods often struggle in dynamic driving environments, as they usually focus on static images or videos and rely on downsampling to manage computational costs. This results in the loss of critical details and the difficulty in effectively integrating spatial and temporal information, undermining fine-grained perception and temporal coherence essential for effective decision-making. To tackle these challenges, we introduce LaVida Drive, a novel and efficient VQA framework for autonomous driving. LaVida Drive seamlessly integrates temporal data while maintaining high-resolution inputs for detailed visual perception. It optimizes spatial processing by retaining high-resolution data for intricate details and using lower-resolution inputs for temporal analysis to focus on motion-related features, thereby boosting computational efficiency. The core of LaVida Drive consists of two modules: the \textit{Query-aware Token Selection} module and the \textit{Spatial-Temporal Token Recovery and Enhancement} module. The former dynamically selects the most relevant visual tokens based on semantic alignment with the input query, reducing the token count from high-resolution spatial input. The latter ensures smooth and coherent interactions between spatial and temporal information, preserving contextual continuity across frames. Extensive experiments on various autonomous driving question-answering benchmarks show that LaVida Drive significantly reduces visual tokens, enhances efficiency, and improves overall performance.
Connection Pruning for Deep Spiking Neural Networks with On-Chip Learning
Nguyen, Thao N. N., Veeravalli, Bharadwaj, Fong, Xuanyao
Long training time hinders the potential of the deep Spiking Neural Network (SNN) with the online learning capability to be realized on the embedded systems hardware. Our work proposes a novel connection pruning approach that can be applied during the online Spike Timing Dependent Plasticity (STDP)-based learning to optimize the learning time and the network connectivity of the SNN. Our connection pruning approach was evaluated on a deep SNN with the Time To First Spike (TTFS) coding and has successfully achieved 2.1x speed-up in the online learning and reduced the network connectivity by 92.83%. The energy consumption in the online learning was saved by 64%. Moreover, the connectivity reduction results in 2.83x speed-up and 78.24% energy saved in the inference. Meanwhile, the classification accuracy remains the same as our non-pruning baseline on the Caltech 101 dataset. In addition, we developed an event-driven hardware architecture on the Field Programmable Gate Array (FPGA) platform that efficiently incorporates our proposed connection pruning approach while incurring as little as 0.56% power overhead. Moreover, we performed a comparison between our work and the existing works on connection pruning for SNN to highlight the key features of each approach. To the best of our knowledge, our work is the first to propose a connection pruning algorithm that can be applied during the online STDP-based learning for a deep SNN with the TTFS coding.