reconstruct
Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution
Convolutional neural networks (CNNs) have recently achieved great success in single-image super-resolution (SISR). However, these methods tend to produce over-smoothed outputs and miss some textural details. To solve these problems, we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain. The proposed SRCliqueNet firstly extracts a set of feature maps from the low resolution (LR) image by the clique blocks group. Then we send the set of feature maps to the clique up-sampling module to reconstruct the HR image. The clique up-sampling module consists of four sub-nets which predict the high resolution wavelet coefficients of four sub-bands. Since we consider the edge feature properties of four sub-bands, the four sub-nets are connected to the others so that they can learn the coefficients of four sub-bands jointly. Finally we apply inverse discrete wavelet transform (IDWT) to the output of four sub-nets at the end of the clique up-sampling module to increase the resolution and reconstruct the HR image. Extensive quantitative and qualitative experiments on benchmark datasets show that our method achieves superior performance over the state-of-the-art methods.
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Vision (0.70)
- Asia > China > Hong Kong (0.04)
- North America > United States (0.04)
- Europe > Germany (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Taiwan (0.05)
- Europe > Portugal > Aveiro > Aveiro (0.04)
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- (7 more...)
SupplementaryMaterial
To study the accuracy of the predicted rotation angles by TARGET-VAE, we calculate the mean standard deviation ofthepredicted rotations, introduced in[1]. This metric basically measures the mean square error between the rotation ofthe object inthe input image and the predicted rotation forthatobject. Wefind that the model correctly identifies and reconstructs the objects (Figure 3). Eachshape is rotated by one of 40 values linearly spaced in [0, 2π], translated across bothx and y dimensions, and scaled using one of six linearly spaced values in [0.5, 1]. Weobserved that, as expected, eliminating inference on the discretized rotation dimension has a significant negative effect on identifying transformation-invariant representations and the clustering accuracy on MNIST(U) is only33.8%(Table2).
SCube: Instant Large-Scale Scene Reconstruction using VoxSplats
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images. Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold. To reconstruct a VoxSplat from images, we employ a hierarchical voxel latent diffusion model conditioned on the input images followed by a feedforward appearance prediction model. The diffusion model generates high-resolution grids progressively in a coarse-to-fine manner, and the appearance network predicts a set of Gaussians within each voxel. From as few as 3 non-overlapping input images, SCube can generate millions of Gaussians with a 10243 voxel grid spanning hundreds of meters in 20 seconds. Past works tackling scene reconstruction from images either rely on per-scene optimization and fail to reconstruct the scene away from input views (thus requiring dense view coverage as input) or leverage geometric priors based on low-resolution models, which produce blurry results. In contrast, SCube leverages high-resolution sparse networks and produces sharp outputs from few views. We show the superiority of SCube compared to prior art using the Waymo self-driving dataset on 3D reconstruction and demonstrate its applications, such as LiDAR simulation and text-to-scene generation.