AITopics

2502.02682

Country: North America > United States > Utah (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-21-2024

Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

Zhang, Zhanjie, Sun, Jiakai, Li, Guangyuan, Zhao, Lei, Zhang, Quanwei, Lan, Zehua, Yin, Haolin, Xing, Wei, Lin, Huaizhong, Zuo, Zhiwen

Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style features. Extensive experiments demonstrate that our proposed method generates high-quality stylized images and effectively prevents artifacts compared with the existing state-of-the-art methods.

artificial intelligence, machine learning, style transfer, (18 more...)

2404.13584

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-7-2023

Multi-Resolution Active Learning of Fourier Neural Operators

Li, Shibo, Yu, Xin, Xing, Wei, Kirby, Mike, Narayan, Akil, Zhe, Shandian

Fourier Neural Operator (FNO) is a popular operator learning framework, which not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO is a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.

artificial intelligence, machine learning, resolution, (14 more...)

2309.16971

Country:

North America > United States (0.15)
Asia (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJun-2-2023

VGOS: Voxel Grid Optimization for View Synthesis from Sparse Inputs

Sun, Jiakai, Zhang, Zhanjie, Chen, Jiafu, Li, Guangyuan, Ji, Boyan, Zhao, Lei, Xing, Wei, Lin, Huaizhong

Neural Radiance Fields (NeRF) has shown great success in novel view synthesis due to its state-of-the-art quality and flexibility. However, NeRF requires dense input views (tens to hundreds) and a long training time (hours to days) for a single scene to generate high-fidelity images. Although using the voxel grids to represent the radiance field can significantly accelerate the optimization process, we observe that for sparse inputs, the voxel grids are more prone to overfitting to the training views and will have holes and floaters, which leads to artifacts. In this paper, we propose VGOS, an approach for fast (3-5 minutes) radiance field reconstruction from sparse inputs (3-10 views) to address these issues. To improve the performance of voxel-based radiance field in sparse input scenarios, we propose two methods: (a) We introduce an incremental voxel training strategy, which prevents overfitting by suppressing the optimization of peripheral voxels in the early stage of reconstruction. (b) We use several regularization techniques to smooth the voxels, which avoids degenerate solutions. Experiments demonstrate that VGOS achieves state-of-the-art performance for sparse inputs with super-fast convergence. Code will be available at https://github.com/SJoJoK/VGOS.

artificial intelligence, machine learning, radiance field, (14 more...)

2304.13386

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceNov-28-2022

MicroAST: Towards Super-Fast Ultra-Resolution Arbitrary Style Transfer

Wang, Zhizhong, Zhao, Lei, Zuo, Zhiwen, Li, Ailin, Chen, Haibo, Xing, Wei, Lu, Dongming

Arbitrary style transfer (AST) transfers arbitrary artistic styles onto content images. Despite the recent rapid progress, existing AST methods are either incapable or too slow to run at ultra-resolutions (e.g., 4K) with limited resources, which heavily hinders their further applications. In this paper, we tackle this dilemma by learning a straightforward and lightweight model, dubbed MicroAST. The key insight is to completely abandon the use of cumbersome pre-trained Deep Convolutional Neural Networks (e.g., VGG) at inference. Instead, we design two micro encoders (content and style encoders) and one micro decoder for style transfer. The content encoder aims at extracting the main structure of the content image. The style encoder, coupled with a modulator, encodes the style image into learnable dual-modulation signals that modulate both intermediate features and convolutional filters of the decoder, thus injecting more sophisticated and flexible style signals to guide the stylizations. In addition, to boost the ability of the style encoder to extract more distinct and representative style signals, we also introduce a new style signal contrastive loss in our model. Compared to the state of the art, our MicroAST not only produces visually superior results but also is 5-73 times smaller and 6-18 times faster, for the first time enabling super-fast (about 0.5 seconds) AST at 4K ultra-resolutions. Code is available at https://github.com/EndyWon/MicroAST.

artificial intelligence, machine learning, style transfer, (16 more...)

2211.15313

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceDec-6-2021

Texture Reformer: Towards Fast and Universal Interactive Texture Transfer

Wang, Zhizhong, Zhao, Lei, Chen, Haibo, Li, Ailin, Zuo, Zhiwen, Xing, Wei, Lu, Dongming

In this paper, we present the texture reformer, a fast and universal neural-based framework for interactive texture transfer with user-specified guidance. The challenges lie in three aspects: 1) the diversity of tasks, 2) the simplicity of guidance maps, and 3) the execution efficiency. To address these challenges, our key idea is to use a novel feed-forward multi-view and multi-stage synthesis procedure consisting of I) a global view structure alignment stage, II) a local view texture refinement stage, and III) a holistic effect enhancement stage to synthesize high-quality results with coherent structures and fine texture details in a coarse-to-fine fashion. In addition, we also introduce a novel learning-free view-specific texture reformation (VSTR) operation with a new semantic map guidance strategy to achieve more accurate semantic-guided and structure-preserved texture transfer. The experimental results on a variety of application scenarios demonstrate the effectiveness and superiority of our framework. And compared with the state-of-the-art interactive texture transfer algorithms, it not only achieves higher quality results but, more remarkably, also is 2-5 orders of magnitude faster. Code is available at https://github.com/EndyWon/Texture-Reformer.

artificial intelligence, computer vision, machine learning, (15 more...)

2112.02788

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)

arXiv.org Machine LearningOct-21-2020

Multi-Fidelity Bayesian Optimization via Deep Neural Networks

Li, Shibo, Xing, Wei, Kirby, Mike, Zhe, Shandian

Bayesian optimization (BO) is a popular framework to optimize black-box functions. In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy. To reduce the optimization cost, many multi-fidelity BO methods have been proposed. Despite their success, these methods either ignore or over-simplify the strong, complex correlations across the fidelities, and hence can be inefficient in estimating the objective function. To address this issue, we propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities to improve the objective function estimation and hence the optimization performance. We use sequential, fidelity-wise Gauss-Hermite quadrature and moment-matching to fulfill a mutual information-based acquisition function, which is computationally tractable and efficient. We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.

air transportation, deep learning, fidelity, (18 more...)

2007.03117

Country: North America > United States > Utah (0.15)

Genre: Research Report (0.82)

Industry: Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

arXiv.org Machine LearningSep-6-2020

Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization

Zuo, Zhiwen, Xu, Qijiang, Zhang, Huiming, Wang, Zhizhong, Chen, Haibo, Li, Ailin, Zhao, Lei, Xing, Wei, Lu, Dongming

In this paper, we present a novel framework that can achieve multimodal image-to-image translation by simply encouraging the statistical dependence between the latent code and the output image in conditional generative adversarial networks. In addition, by incorporating a U-net generator into our framework, our method only needs to learn a one-sided translation model from the source image domain to the target image domain for both supervised and unsupervised multimodal image-to-image translation. Furthermore, our method also achieves disentanglement between the source domain content and the target domain style for free. We conduct experiments under supervised and unsupervised settings on various benchmark image-to-image translation datasets compared with the state-of-the-art methods, showing the effectiveness and simplicity of our method to achieve multimodal and high-quality results.

artificial intelligence, dataset, neural network, (16 more...)

2008.03529

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJun-8-2020

Physics Regularized Gaussian Processes

Wang, Zheng, Xing, Wei, Kirby, Robert, Zhe, Shandian

We consider incorporating incomplete physics knowledge, expressed as differential equations with latent functions, into Gaussian processes (GPs) to improve their performance, especially for limited data and extrapolation. While existing works have successfully encoded such knowledge via kernel convolution, they only apply to linear equations with analytical Green's functions. The convolution can further restrict us from fusing physics with highly expressive kernels, e.g., deep kernels. To overcome these limitations, we propose Physics Regularized Gaussian Process (PRGP) that can incorporate both linear and nonlinear equations, does not rely on Green's functions, and is free to use arbitrary kernels. Specifically, we integrate the standard GP with a generative model to encode the differential equation in a principled Bayesian hybrid framework. For efficient and effective inference, we marginalize out the latent variables and derive a simplified model evidence lower bound (ELBO), based on which we develop a stochastic collapsed inference algorithm. Our ELBO can be viewed as a posterior regularization objective. We show the advantage of our approach in both simulation and real-world applications.

artificial intelligence, kernel, machine learning, (17 more...)

2006.04976

Country: North America > United States > Utah (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.66)

arXiv.org Machine LearningJun-8-2020

Multi-Fidelity High-Order Gaussian Processes for Physical Simulation

Wang, Zheng, Xing, Wei, Kirby, Robert, Zhe, Shandian

The key task of physical simulation is to solve partial differential equations (PDEs) on discretized domains, which is known to be costly. In particular, high-fidelity solutions are much more expensive than low-fidelity ones. To reduce the cost, we consider novel Gaussian process (GP) models that leverage simulation examples of different fidelities to predict high-dimensional PDE solution outputs. Existing GP methods are either not scalable to high-dimensional outputs or lack effective strategies to integrate multi-fidelity examples. To address these issues, we propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations both between the outputs and between the fidelities to enhance solution estimation, and scale to large numbers of outputs. Based on a novel nonlinear coregionalization model, MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights to capture the (nonlinear) relationships across the fidelities. To improve inference efficiency and quality, we use bases decomposition to largely reduce the model parameters, and layer-wise matrix Gaussian posteriors to capture the posterior dependency and to simplify the computation. Our stochastic variational learning algorithm successfully handles millions of outputs without extra sparse approximations. We show the advantages of our method in several typical applications.

artificial intelligence, fidelity, machine learning, (17 more...)

2006.04972

Country: North America > United States > Utah (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)