Goto

Collaborating Authors

 module


Graph Convolutional Support Vector Regression for Robust Spatiotemporal Forecasting of Urban Air Pollution

arXiv.org Machine Learning

Urban air quality forecasting is challenging because pollutant concentrations are nonlinear, nonstationary, spatiotemporally dependent, and often affected by anomalous observations caused by traffic congestion, industrial emissions, and seasonal meteorological variability. This study proposes a Graph Convolutional Support Vector Regression (GCSVR) framework for robust spatiotemporal forecasting of urban air pollution. The model combines graph convolutional learning to capture inter-station spatial dependence with support vector regression to model nonlinear temporal dynamics while reducing sensitivity to outlier observations. The proposed framework is evaluated using air quality records from 37 monitoring stations in Delhi and 18 stations in Mumbai, representing inland and coastal metropolitan environments in India. Forecasting performance is assessed across multiple horizons and compared with established temporal and spatiotemporal benchmarks. The results show that GCSVR consistently improves predictive accuracy and maintains stable performance across seasons and outlier-prone pollution episodes. Statistical test further confirms the reliability of the proposed approach across the two cities. Finally, conformal prediction is integrated with GCSVR to generate calibrated prediction intervals, enhancing its practical value for uncertainty-aware air quality monitoring and public health decision-making.


Robust Visual Reasoning via Language Guided Neural Module Networks

Neural Information Processing Systems

Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF). A key limitation in prior implementations of NMN is that the neural modules do not effectively capture the association between the visual input and the relevant neighbourhood context of the textual input.






e150e6d0a1e5214740c39c6e4503ba7a-Supplemental-Conference.pdf

Neural Information Processing Systems

Appendix382 AAdditional Experiments3383 A.1 Experiments on the ETT datasets384 In the main body, we present a comparison of the benchmark methods on the ETTm2 dataset. In this385 section, we extend our analysis to the remaining three ETT datasets, namely ETTh1, ETTh2, and386 ETTm1, as summarized in Table 7. Our experimental results reveal that Basisformer outperforms all387 other methods in terms of MSE and MAE. In all experiments, lower MSE values indicate better model performance, and we present the best results in boldface. Experimental results with longer length input setting391 Throughout our research, we maintain consistency in our experimental settings by fixing the input392 length to be 96(with a reduced input length of 36for the illness dataset), instead of using a longer393 length.


Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams

Neural Information Processing Systems

Traditional cameras produce desirable vision results but struggle with motion blur in high-speed scenes due to long exposure windows. Existing frame-based deblurring algorithms face challenges in extracting useful motion cues from severely blurred images. Recently, an emerging bio-inspired vision sensor known as the spike camera has achieved an extremely high frame rate while preserving rich spatial details, owing to its novel sampling mechanism. However, typical binary spike streams are relatively low-resolution, degraded image signals devoid of color information, making them unfriendly to human vision. In this paper, we propose a novel approach that integrates the two modalities from two branches, leveraging spike streams as auxiliary visual cues for guiding deblurring in high-speed motion scenes. We propose the first spike-based motion deblurring model with bidirectional information complementarity. We introduce a content-aware motion magnitude attention module that utilizes learnable mask to extract relevant information from blurry images effectively, and we incorporate a transposed cross-attention fusion module to efficiently combine features from both spike data and blurry RGB images. Furthermore, we build two extensive synthesized datasets for training and validation purposes, encompassing high-temporal-resolution spikes, blurry images, and corresponding sharp images. The experimental results demonstrate that our method effectively recovers clear RGB images from highly blurry scenes and outperforms state-of-the-art deblurring algorithms in multiple settings.