styler
D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods
Susladkar, Onkar, Deshmukh, Gayatri, Mittal, Sparsh, Shastri, Parth
In image processing, one of the most challenging tasks is to render an image's semantic meaning using a variety of artistic approaches. Existing techniques for arbitrary style transfer (AST) frequently experience mode-collapse, over-stylization, or under-stylization due to a disparity between the style and content images. We propose a novel framework called D$^2$Styler (Discrete Diffusion Styler) that leverages the discrete representational capability of VQ-GANs and the advantages of discrete diffusion, including stable training and avoidance of mode collapse. Our method uses Adaptive Instance Normalization (AdaIN) features as a context guide for the reverse diffusion process. This makes it easy to move features from the style image to the content image without bias. The proposed method substantially enhances the visual quality of style-transferred images, allowing the combination of content and style in a visually appealing manner. We take style images from the WikiArt dataset and content images from the COCO dataset. Experimental results demonstrate that D$^2$Styler produces high-quality style-transferred images and outperforms twelve existing methods on nearly all the metrics. The qualitative results and ablation studies provide further insights into the efficacy of our technique. The code is available at https://github.com/Onkarsus13/D2Styler.
Styler
Robot navigation through non-uniform environments requires reliable motion plan generation. The choice of planning model fidelity can significantly impact performance. Prior research has shown that reducing model fidelity saves planning time, but sacrifices execution reliability. While current adaptive hierarchical motion planning techniques are promising, we present a framework that leverages a richer set of robot motion models at plan-time. The framework chooses when to switch models and what model is most applicable within a single trajectory.
STYLER: Style Modeling with Rapidity and Robustness via SpeechDecomposition for Expressive and Controllable Neural Text to Speech
Lee, Keon, Park, Kyumin, Kim, Daeyoung
Previous works on expressive text-to-speech (TTS) have a limitation on robustness and speed when training and inferring. Such drawbacks mostly come from autoregressive decoding, which makes the succeeding step vulnerable to preceding error. To overcome this weakness, we propose STYLER, a novel expressive text-to-speech model with parallelized architecture. Expelling autoregressive decoding and introducing speech decomposition for encoding enables speech synthesis more robust even with high style transfer performance. Moreover, our novel noise modeling approach from audio using domain adversarial training and Residual Decoding enabled style transfer without transferring noise. Our experiments prove the naturalness and expressiveness of our model from comparison with other parallel TTS models. Together we investigate our model's robustness and speed by comparison with the expressive TTS model with autoregressive decoding.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > Puerto Rico (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)