Yu, Qian
SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation
Xing, Ximing, Yu, Qian, Wang, Chuang, Zhou, Haitao, Zhang, Jing, Xu, Dong
Recently, text-guided scalable vector graphics (SVG) synthesis has demonstrated significant potential in domains such as iconography and sketching. However, SVGs generated from existing Text-to-SVG methods often lack editability and exhibit deficiencies in visual quality and diversity. In this paper, we propose a novel text-guided vector graphics synthesis method to address these limitations. To enhance the editability of output SVGs, we introduce a Hierarchical Image VEctorization (HIVE) framework that operates at the semantic object level and supervises the optimization of components within the vector object. This approach facilitates the decoupling of vector graphics into distinct objects and component levels. Our proposed HIVE algorithm, informed by image segmentation priors, not only ensures a more precise representation of vector graphics but also enables fine-grained editing capabilities within vector objects. To improve the diversity of output SVGs, we present a Vectorized Particle-based Score Distillation (VPSD) approach. VPSD addresses over-saturation issues in existing methods and enhances sample diversity. A pre-trained reward model is incorporated to re-weight vector particles, improving aesthetic appeal and enabling faster convergence. Additionally, we design a novel adaptive vector primitives control strategy, which allows for the dynamic adjustment of the number of primitives, thereby enhancing the presentation of graphic details. Extensive experiments validate the effectiveness of the proposed method, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. We also show that our new method supports up to six distinct vector styles, capable of generating high-quality vector assets suitable for stylized vector design and poster design. Code and demo will be released at: http://ximinng.github.io/SVGDreamerV2Project/
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
Xing, Ximing, Hu, Juncheng, Zhang, Jing, Xu, Dong, Yu, Qian
The generation of Scalable Vector Graphics (SVG) assets from textual data remains a significant challenge, largely due to the scarcity of high-quality vector datasets and the limitations in scalable vector representations required for modeling intricate graphic distributions. This work introduces SVGFusion, a Text-to-SVG model capable of scaling to real-world SVG data without reliance on a text-based discrete language model or prolonged SDS optimization. The essence of SVGFusion is to learn a continuous latent space for vector graphics with a popular Text-to-Image framework. Specifically, SVGFusion consists of two modules: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) and a Vector Space Diffusion Transformer (VS-DiT). VP-VAE takes both the SVGs and corresponding rasterizations as inputs and learns a continuous latent space, whereas VS-DiT learns to generate a latent code within this space based on the text prompt. Based on VP-VAE, a novel rendering sequence modeling strategy is proposed to enable the latent space to embed the knowledge of construction logics in SVGs. This empowers the model to achieve human-like design capabilities in vector graphics, while systematically preventing occlusion in complex graphic compositions. Moreover, our SVGFusion's ability can be continuously improved by leveraging the scalability of the VS-DiT by adding more VS-DiT blocks. A large-scale SVG dataset is collected to evaluate the effectiveness of our proposed method. Extensive experimentation has confirmed the superiority of our SVGFusion over existing SVG generation methods, achieving enhanced quality and generalizability, thereby establishing a novel framework for SVG content creation. Code, model, and data will be released at: \href{https://ximinng.github.io/SVGFusionProject/}{https://ximinng.github.io/SVGFusionProject/}
Deep Learning for Cross-Border Transaction Anomaly Detection in Anti-Money Laundering Systems
In the context of globalization and the rapid expansion of the digital economy, anti-money laundering (AML) has become a crucial aspect of financial oversight, particularly in cross-border transactions. The rising complexity and scale of international financial flows necessitate more intelligent and adaptive AML systems to combat increasingly sophisticated money laundering techniques. This paper explores the application of unsupervised learning models in cross-border AML systems, focusing on rule optimization through contrastive learning techniques. Five deep learning models, ranging from basic convolutional neural networks (CNNs) to hybrid CNNGRU architectures, were designed and tested to assess their performance in detecting abnormal transactions. The results demonstrate that as model complexity increases, so does the system's detection accuracy and responsiveness. In particular, the self-developed hybrid Convolutional-Recurrent Neural Integration Model (CRNIM) model showed superior performance in terms of accuracy and area under the receiver operating characteristic curve (AUROC). These findings highlight the potential of unsupervised learning models to significantly improve the intelligence, flexibility, and real-time capabilities of AML systems. By optimizing detection rules and enhancing adaptability to emerging money laundering schemes, this research provides both theoretical and practical contributions to the advancement of AML technologies, which are essential for safeguarding the global financial system against illicit activities.
Contrastive Learning for Knowledge-Based Question Generation in Large Language Models
Zhang, Zhenhong, Chen, Jiajing, Shi, Weiyan, Yi, Lingjie, Wang, Chihang, Yu, Qian
With the rapid development of artificial intelligence technology, especially the increasingly widespread application of question-and-answer systems, high-quality question generation has become a key component in supporting the development of these systems. This article focuses on knowledge-based question generation technology, which aims to enable computers to simulate the human questioning process based on understanding specific texts or knowledge bases. In light of the issues of hallucination and knowledge gaps present in large-scale language models when applied to knowledge-intensive tasks, this paper proposes an enhanced question generation method that incorporates contrastive learning. This method utilizes multiple models to jointly mine domain knowledge and uses contrastive learning to guide the model in reducing noise and hallucinations in generation. Experimental results show that by designing prompts containing contrasting examples, the model's performance in question generation improves considerably, particularly when contrasting instructions and examples are used simultaneously, leading to the highest quality of generated questions and improved accuracy. These results demonstrate that the method proposed in this study, which combines contrasting context and chain-of-thought prompts, can effectively improve both the quality and the practicality of question generation.
Graph Neural Network Framework for Sentiment Analysis Using Syntactic Feature
Wu, Linxiao, Luo, Yuanshuai, Zhu, Binrong, Liu, Guiran, Wang, Rui, Yu, Qian
Amidst the swift evolution of social media platforms and e-commerce ecosystems, the domain of opinion mining has surged as a pivotal area of exploration within natural language processing. A specialized segment within this field focuses on extracting nuanced evaluations tied to particular elements within textual contexts. This research advances a composite framework that amalgamates the positional cues of topical descriptors. The proposed system converts syntactic structures into a matrix format, leveraging convolutions and attention mechanisms within a graph to distill salient characteristics. Incorporating the positional relevance of descriptors relative to lexical items enhances the sequential integrity of the input. Trials have substantiated that this integrated graph-centric scheme markedly elevates the efficacy of evaluative categorization, showcasing preeminence.
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity
Yu, Qian, Wang, Yining, Huang, Baihe, Lei, Qi, Lee, Jason D.
Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds. We propose an algorithm that features a combination of a bootstrapping stage and a mirror-descent stage. Our main technical innovation consists of a sharp characterization for the spherical-sampling gradient estimator under higher-order smoothness conditions, which allows the algorithm to optimally balance the bias-variance tradeoff, and a new iterative method for the bootstrapping stage, which maintains the performance for unbounded Hessian.
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Ma, Qinghe, Zhang, Jian, Qi, Lei, Yu, Qian, Shi, Yinghuan, Gao, Yang
Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised medical image Segmentation (MiDSS). In this scenario, we handle data from multiple medical centers, with limited annotations available for a single domain and a large amount of unlabeled data from multiple domains. We found that the key to solving the problem lies in how to generate reliable pseudo labels for the unlabeled data in the presence of domain shift with labeled data. To tackle this issue, we employ Unified Copy-Paste (UCP) between images to construct intermediate domains, facilitating the knowledge transfer from the domain of labeled data to the domains of unlabeled data. To fully utilize the information within the intermediate domain, we propose a symmetric Guidance training strategy (SymGD), which additionally offers direct guidance to unlabeled data by merging pseudo labels from intermediate samples. Subsequently, we introduce a Training Process aware Random Amplitude MixUp (TP-RAM) to progressively incorporate style-transition components into intermediate samples. Compared with existing state-of-the-art approaches, our method achieves a notable 13.57% improvement in Dice score on Prostate dataset, as demonstrated on three public datasets. Our code is available at https://github.com/MQinghe/MiDSS .
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
Xing, Ximing, Wang, Chuang, Zhou, Haitao, Zhang, Jing, Yu, Qian, Xu, Dong
Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of B\'ezier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work. The code and demo of DiffSketcher can be found at https://ximinng.github.io/DiffSketcher-project/.
SVGDreamer: Text Guided SVG Generation with Diffusion Model
Xing, Ximing, Zhou, Haitao, Wang, Chuang, Zhang, Jing, Xu, Dong, Yu, Qian
Recently, text-guided scalable vector graphics (SVGs) synthesis has shown promise in domains such as iconography and sketch. However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity. To address these limitations, we propose a novel text-guided vector graphics synthesis method called SVGDreamer. SVGDreamer incorporates a semantic-driven image vectorization (SIVE) process that enables the decomposition of synthesis into foreground objects and background, thereby enhancing editability. Specifically, the SIVE process introduce attention-based primitive control and an attention-mask loss function for effective control and manipulation of individual elements. Additionally, we propose a Vectorized Particle-based Score Distillation (VPSD) approach to tackle the challenges of color over-saturation, vector primitives over-smoothing, and limited result diversity in existing text-to-SVG generation methods. Furthermore, on the basis of VPSD, we introduce Reward Feedback Learning (ReFL) to accelerate VPSD convergence and improve aesthetic appeal. Extensive experiments have been conducted to validate the effectiveness of SVGDreamer, demonstrating its superiority over baseline methods in terms of editability, visual quality, and diversity. The code and demo of SVGDreamer can be found at \href{https://ximinng.github.io/SVGDreamer-project/}{https://ximinng.github.io/SVGDreamer-project/}.
Data Contamination Issues in Brain-to-Text Decoding
Yin, Congchi, Yu, Qian, Fang, Zhiwei, He, Jie, Peng, Changping, Lin, Zhangang, Shao, Jingping, Li, Piji
Decoding non-invasive cognitive signals to natural language has long been the goal of building practical brain-computer interfaces (BCIs). Recent major milestones have successfully decoded cognitive signals like functional Magnetic Resonance Imaging (fMRI) and electroencephalogram (EEG) into text under open vocabulary setting. However, how to split the datasets for training, validating, and testing in cognitive signal decoding task still remains controversial. In this paper, we conduct systematic analysis on current dataset splitting methods and find the existence of data contamination largely exaggerates model performance. Specifically, first we find the leakage of test subjects' cognitive signals corrupts the training of a robust encoder. Second, we prove the leakage of text stimuli causes the auto-regressive decoder to memorize information in test set. The decoder generates highly accurate text not because it truly understands cognitive signals. To eliminate the influence of data contamination and fairly evaluate different models' generalization ability, we propose a new splitting method for different types of cognitive datasets (e.g. fMRI, EEG). We also test the performance of SOTA Brain-to-Text decoding models under the proposed dataset splitting paradigm as baselines for further research.