weight map
SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting
Asthana, Pranav, Hanson, Alex, Tu, Allen, Goldstein, Tom, Zwicker, Matthias, Varshney, Amitabh
3D Gaussian Splatting (3DGS) enables high-quality novel view synthesis, motivating interest in generating higher-resolution renders than those available during training. A natural strategy is to apply super-resolution (SR) to low-resolution (LR) input views, but independently enhancing each image introduces multi-view inconsistencies, leading to blurry renders. Prior methods attempt to mitigate these inconsistencies through learned neural components, temporally consistent video priors, or joint optimization on LR and SR views, but all uniformly apply SR across every image. In contrast, our key insight is that close-up LR views may contain high-frequency information for regions also captured in more distant views, and that we can use the camera pose relative to scene geometry to inform where to add SR content. Building from this insight, we propose SplatSuRe, a method that selectively applies SR content only in undersampled regions lacking high-frequency supervision, yielding sharper and more consistent results. Across Tanks & Temples, Deep Blending and Mip-NeRF 360, our approach surpasses baselines in both fidelity and perceptual quality. Notably, our gains are most significant in localized foreground regions where higher detail is desired.
HDR Image Reconstruction using an Unsupervised Fusion Model
High Dynamic Range (HDR) imaging aims to reproduce the wide range of brightness levels present in natural scenes, which the human visual system can perceive but conventional digital cameras often fail to capture due to their limited dynamic range. To address this limitation, we propose a deep learning-based multi-exposure fusion approach for HDR image generation. The method takes a set of differently exposed Low Dynamic Range (LDR) images, typically an underexposed and an overexposed image, and learns to fuse their complementary information using a convolutional neural network (CNN). The underexposed image preserves details in bright regions, while the overexposed image retains information in dark regions; the network effectively combines these to reconstruct a high-quality HDR output. The model is trained in an unsupervised manner, without relying on ground-truth HDR images, making it practical for real-world applications where such data is unavailable. We evaluate our results using the Multi-Exposure Fusion Structural Similarity Index Measure (MEF-SSIM) and demonstrate that our approach achieves superior visual quality compared to existing fusion methods. A customized loss function is further introduced to improve reconstruction fidelity and optimize model performance.
Precipitation Prediction Using an Ensemble of Lightweight Learners
Li, Xinzhe, Rui, Sun, Niu, Yiming, Liu, Yao
Precipitation prediction plays a crucial role in modern agriculture and industry. However, it poses significant challenges due to the diverse patterns and dynamics in time and space, as well as the scarcity of high precipitation events. To address this challenge, we propose an ensemble learning framework that leverages multiple learners to capture the diverse patterns of precipitation distribution. Specifically, the framework consists of a precipitation predictor with multiple lightweight heads (learners) and a controller that combines the outputs from these heads. The learners and the controller are separately optimized with a proposed 3-stage training scheme. By utilizing provided satellite images, the proposed approach can effectively model the intricate rainfall patterns, especially for high precipitation events. It achieved 1st place on the core test as well as the nowcasting leaderboards of the Weather4Cast 2023 competition.
From Compass and Ruler to Convolution and Nonlinearity: On the Surprising Difficulty of Understanding a Simple CNN Solving a Simple Geometric Estimation Task
Dagรจs, Thomas, Lindenbaum, Michael, Bruckstein, Alfred M.
Neural networks are omnipresent, but remain poorly understood. Their increasing complexity and use in critical systems raises the important challenge to full interpretability. We propose to address a simple well-posed learning problem: estimating the radius of a centred pulse in a one-dimensional signal or of a centred disk in two-dimensional images using a simple convolutional neural network. Surprisingly, understanding what trained networks have learned is difficult and, to some extent, counter-intuitive. However, an in-depth theoretical analysis in the one-dimensional case allows us to comprehend constraints due to the chosen architecture, the role of each filter and of the nonlinear activation function, and every single value taken by the weights of the model. Two fundamental concepts of neural networks arise: the importance of invariance and of the shape of the nonlinear activation functions.
Spatial-Temporal Convolutional Attention for Mapping Functional Brain Networks
Liu, Yiheng, Ge, Enjie, Qiang, Ning, Liu, Tianming, Ge, Bao
Recently, to overcome the shallow nature of the linear models, various of deep learning based methods have been Using functional magnetic resonance imaging (fMRI) and proposed to discover the FBNs. Most of these methods are deep learning to explore functional brain networks (FBNs) based on the autoencoders, they use different autoencoders has attracted many researchers. However, most of these to extract the sources in an self-supervised manner, and then studies are still based on the temporal correlation between use the generative linear model, such as LASSO to generate the sources and voxel signals, and lack of researches on the the FBNs [6, 7]. In general, these deep learning based methods dynamics of brain function. Due to the widespread local can indeed extract better encoder representations as the correlations in the volumes, FBNs can be generated directly sources than the classical methods, such as ICA and SDL, but in the spatial domain in a self-supervised manner by using still generate FBNs in a linear and independent manner, with spatial-wise attention (SA), and the resulting FBNs has the sources extraction and the FBNs generation as 2 separate a higher spatial similarity with templates compared to the steps. Generating the FBNs in such way is time-consuming classical method. Therefore, we proposed a novel Spatial-and does not fully utilize the advantages of deep learning, and Temporal Convolutional Attention (STCA) model to discover cannot directly generate the FBNs with deep learning.
HyperGuider: Virtual Reality Framework for Interactive Path Planning of Quadruped Robot in Cluttered and Multi-Terrain Environments
Babataev, Ildar, Fedoseev, Aleksey, Weerakkodi, Nipun, Nazarova, Elena, Tsetserukou, Dzmitry
Quadruped platforms have become an active topic of research due to their high mobility and traversability in rough terrain. However, it is highly challenging to determine whether the clattered environment could be passed by the robot and how exactly its path should be calculated. Moreover, the calculated path may pass through areas with dynamic objects or environments that are dangerous for the robot or people around. Therefore, we propose a novel conceptual approach of teaching quadruped robots navigation through user-guided path planning in virtual reality (VR). Our system contains both global and local path planners, allowing robot to generate path through iterations of learning. The VR interface allows user to interact with environment and to assist quadruped robot in challenging scenarios. The results of comparison experiments show that cooperation between human and path planning algorithms can increase the computational speed of the algorithm by 35.58% in average, and non-critically increasing of the path length (average of 6.66%) in test scenario. Additionally, users described VR interface as not requiring physical demand (2.3 out of 10) and highly evaluated their performance (7.1 out of 10). The ability to find a less optimal but safer path remains in demand for the task of navigating in a cluttered and unstructured environment.
Deep Compression
In their current form, Deep Neural Networks require enormous memory to fund their massive over-parameterization. Classic Neural Networks such as AlexNet and VGG-16 require around 240 and 552 MB, respectively. Many efforts have been made to reduce the file size of Neural Networks, generally relying on techniques such as Weight Pruning or Quantization, or SVD decompositions of Weight Matrices. This paper, Deep Compression, combines Pruning, Quantization, and Huffman encoding into a three stage pipeline that reduces the size of AlexNet by a factor of 35x and VGG-16 by 49x. This results in AlexNet being reduced from 240 to 6.9 MB and VGG-16 from 552 to 11.3 MB.
Bottleneck Supervised U-Net for Pixel-wise Liver and Tumor Segmentation
Li, Song, Tso, Geoffrey Kwok Fai
Convolutional neural network (CNN) has been widely used for image processing tasks.In this paper we design a bottleneck supervised U-Net model and apply it to liver and tumor segmentation. Taking an image as input, the model outputs segmented images of the same size, each pixel of which takes value from 1 to K where K is the number of classes to be segmented. The innovations of this paper are two-fold: first we design a novel U-Net structure which include dense block and inception block as the base U-Net; second we design a double U-Net architecture based on the base U-Net and includes an encoding U-Net and a segmentation U-Net. The encoding U-Net is first trained to encode the labels, then the encodings are used to supervise the bottleneck of the segmentation U-Net. While training the segmentation U-Net, a weighted average of dice loss(for the final output) and MSE loss(for the bottleneck) is used as the overall loss function. This approach can help retain the hidden features of input images. The model is applied to a liver tumor 3D CT scan dataset to conduct liver and tumor segmentation sequentially. Experimental results indicate bottleneck supervised U-Net can accomplish segmentation tasks effectively with better performance in controlling shape distortion, reducing false positive and false negative, besides accelerating convergence. Besides, this model has good generalization for further improvement.
Interpreting weight maps in terms of cognitive or clinical neuroscience: nonsense?
Schrouff, Jessica, Mourao-Miranda, Janaina
Linear machine learning models can be seen as providing two outputs: predictions and weight maps. The latter shows the relative contribution of the individual features to the model and has been heavily used in the neuroimaging community to infer conclusions about brain structure/function. There has however been a recent debate on whether weight maps can provide information about the neural signals leading to a significant classification/regression model [1]-[3]. The authors of [1] indeed suggest that weight maps provide a poor recovery of the input neural signal and lead to false positives. They further demonstrate that the amplitude of the weight does not reflect the amplitude of the signal difference in a feature. However, their examples are specific cases with low signalto-noise ratio (SNR). Here, we investigate the recovery of two widespread techniques, namely SVM [4] and sparse MKL [5] when varying the SNR, as well as the distribution of simulated neural signals.