AITopics | Wang, Yaxing

Collaborating Authors

Wang, Yaxing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Liu, Tao, Wang, Kai, Li, Senmao, van de Weijer, Joost, Khan, Fahad Shahbaz, Yang, Shiqi, Wang, Yaxing, Yang, Jian, Cheng, Ming-Ming

arXiv.org Artificial IntelligenceFeb-5-2025

Text-to-image generation models can create high-quality images from input prompts. However, they struggle to support the consistent generation of identity-preserving requirements for storytelling. Existing approaches to this problem typically require extensive training in large datasets or additional modifications to the original model architectures. This limits their applicability across different domains and diverse diffusion model configurations. In this paper, we first observe the inherent capability of language models, coined context consistency, to comprehend identity through context with a single prompt. Drawing inspiration from the inherent context consistency, we propose a novel training-free method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" (1Prompt1Story). Our approach 1Prompt1Story concatenates all prompts into a single input for T2I diffusion models, initially preserving character identities. We then refine the generation process using two novel techniques: Singular-Value Reweighting and Identity-Preserving Cross-Attention, ensuring better alignment with the input description for each frame. In our experiments, we compare our method against various existing consistent T2I generation approaches to demonstrate its effectiveness through quantitative metrics and qualitative assessments. Code is available at https://github.com/byliutao/1Prompt1Story.

consistency, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.13554

Country:

Asia (0.14)
Europe (0.14)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection?

Yew, Samantha Min Er, Lei, Xiaofeng, Goh, Jocelyn Hui Lin, Chen, Yibing, Srinivasan, Sahana, Chee, Miao-li, Pushpanathan, Krithi, Zou, Ke, Hou, Qingshan, Da Soh, Zhi, Xue, Cancan, Yu, Marco Chak Yan, Sabanayagam, Charumathi, Tai, E Shyong, Sim, Xueling, Wang, Yaxing, Jonas, Jost B., Nangia, Vinay, Yang, Gabriel Dawei, Ran, Emma Anran, Cheung, Carol Yim-Lui, Feng, Yangqin, Zhou, Jun, Goh, Rick Siow Mong, Zhou, Yukun, Keane, Pearse A., Liu, Yong, Cheng, Ching-Yu, Tham, Yih-Chung

arXiv.org Artificial IntelligenceJan-21-2025

Background: RETFound, a self-supervised, retina-specific foundation model (FM), showed potential in downstream applications. However, its comparative performance with traditional deep learning (DL) models remains incompletely understood. This study aimed to evaluate RETFound against three ImageNet-pretrained supervised DL models (ResNet50, ViT-base, SwinV2) in detecting ocular and systemic diseases. Methods: We fine-tuned/trained RETFound and three DL models on full datasets, 50%, 20%, and fixed sample sizes (400, 200, 100 images, with half comprising disease cases; for each DR severity class, 100 and 50 cases were used. Fine-tuned models were tested internally using the SEED (53,090 images) and APTOS-2019 (3,672 images) datasets and externally validated on population-based (BES, CIEMS, SP2, UKBB) and open-source datasets (ODIR-5k, PAPILA, GAMMA, IDRiD, MESSIDOR-2). Model performance was compared using area under the receiver operating characteristic curve (AUC) and Z-tests with Bonferroni correction (P<0.05/3). Interpretation: Traditional DL models are mostly comparable to RETFound for ocular disease detection with large datasets. However, RETFound is superior in systemic disease detection with smaller datasets. These findings offer valuable insights into the respective merits and limitation of traditional models and FMs.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.12016

Country:

North America > United States (0.46)
Asia > China (0.31)
Europe > United Kingdom (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Hu, Taihang, Li, Linxuan, van de Weijer, Joost, Gao, Hongcheng, Khan, Fahad Shahbaz, Yang, Jian, Cheng, Ming-Ming, Wang, Kai, Wang, Yaxing

arXiv.org Artificial IntelligenceNov-11-2024

Although text-to-image (T2I) models exhibit remarkable generation capabilities, they frequently fail to accurately bind semantically related objects or attributes in the input prompts; a challenge termed semantic binding. Previous approaches either involve intensive fine-tuning of the entire T2I model or require users or large language models to specify generation layouts, adding complexity. In this paper, we define semantic binding as the task of associating a given object with its attribute, termed attribute binding, or linking it to other related sub-objects, referred to as object binding. We introduce a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a single composite token. This ensures that the object, its attributes and sub-objects all share the same cross-attention map. Additionally, to address potential confusion among main objects with complex textual prompts, we propose end token substitution as a complementary strategy. To further refine our approach in the initial stages of T2I generation, where layouts are determined, we incorporate two auxiliary losses, an entropy loss and a semantic binding loss, to iteratively update the composite token to improve the generation integrity. We conducted extensive experiments to validate the effectiveness of ToMe, comparing it against various existing methods on the T2I-CompBench and our proposed GPT-4o object binding benchmark. Our method is particularly effective in complex scenarios that involve multiple objects and attributes, which previous methods often fail to address. The code will be publicly available at \url{https://github.com/hutaihang/ToMe}.

diffusion model, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.07132

Country: Europe (0.28)

Genre:

Research Report > Promising Solution (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

AMSnet-KG: A Netlist Dataset for LLM-based AMS Circuit Auto-Design Using Knowledge Graph RAG

Shi, Yichen, Tao, Zhuofu, Gao, Yuhao, Zhou, Tianjia, Chang, Cheng, Wang, Yaxing, Chen, Bingyu, Zhang, Genhao, Liu, Alvin, Yu, Zhiping, Lin, Ting-Jung, He, Lei

arXiv.org Artificial IntelligenceNov-6-2024

High-performance analog and mixed-signal (AMS) circuits are mainly full-custom designed, which is time-consuming and labor-intensive. A significant portion of the effort is experience-driven, which makes the automation of AMS circuit design a formidable challenge. Large language models (LLMs) have emerged as powerful tools for Electronic Design Automation (EDA) applications, fostering advancements in the automatic design process for large-scale AMS circuits. However, the absence of high-quality datasets has led to issues such as model hallucination, which undermines the robustness of automatically generated circuit designs. To address this issue, this paper introduces AMSnet-KG, a dataset encompassing various AMS circuit schematics and netlists. We construct a knowledge graph with annotations on detailed functional and performance characteristics. Facilitated by AMSnet-KG, we propose an automated AMS circuit generation framework that utilizes the comprehensive knowledge embedded in LLMs. We first formulate a design strategy (e.g., circuit architecture using a number of circuit components) based on required specifications. Next, matched circuit components are retrieved and assembled into a complete topology, and transistor sizing is obtained through Bayesian optimization. Simulation results of the netlist are fed back to the LLM for further topology refinement, ensuring the circuit design specifications are met. We perform case studies of operational amplifier and comparator design to verify the automatic design flow from specifications to netlists with minimal human effort. The dataset used in this paper will be open-sourced upon publishing of this paper.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.1356

Country:

North America > United States (0.46)
Asia > China (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Provable Multi-instance Deep AUC Maximization with Stochastic Pooling

Zhu, Dixian, Wang, Bokun, Chen, Zhi, Wang, Yaxing, Sonka, Milan, Wu, Xiaodong, Yang, Tianbao

arXiv.org Artificial IntelligenceJun-6-2023

This paper considers a novel application of deep AUC maximization (DAM) for multi-instance learning (MIL), in which a single class label is assigned to a bag of instances (e.g., multiple 2D slices of a CT scan for a patient). We address a neglected yet non-negligible computational challenge of MIL in the context of DAM, i.e., bag size is too large to be loaded into {GPU} memory for backpropagation, which is required by the standard pooling methods of MIL. To tackle this challenge, we propose variance-reduced stochastic pooling methods in the spirit of stochastic optimization by formulating the loss function over the pooled prediction as a multi-level compositional function. By synthesizing techniques from stochastic compositional optimization and non-convex min-max optimization, we propose a unified and provable muli-instance DAM (MIDAM) algorithm with stochastic smoothed-max pooling or stochastic attention-based pooling, which only samples a few instances for each bag to compute a stochastic gradient estimator and to update the model parameter. We establish a similar convergence rate of the proposed MIDAM algorithm as the state-of-the-art DAM algorithms. Our extensive experiments on conventional MIL datasets and medical datasets demonstrate the superiority of our MIDAM algorithm.

artificial intelligence, machine learning, survey article, (10 more...)

arXiv.org Artificial Intelligence

2305.0804

Country:

North America > United States > California (0.28)
North America > United States > Texas (0.28)

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Mix and match networks: encoder-decoder alignment for zero-pair image translation

Wang, Yaxing, van de Weijer, Joost, Herranz, Luis

arXiv.org Machine LearningApr-6-2018

We address the problem of image translation between domains or modalities for which no direct paired data is available (i.e. zero-pair translation). We propose mix and match networks, based on multiple encoders and decoders aligned in such a way that other encoder-decoder pairs can be composed at test time to perform unseen image translation tasks between domains or modalities for which explicit paired samples were not seen during training. We study the impact of autoencoders, side information and losses in improving the alignment and transferability of trained pairwise translation models to unseen translations. We show our approach is scalable and can perform colorization and style transfer between unseen combinations of domains. We evaluate our system in a challenging cross-modal setting where semantic segmentation is estimated from depth images, without explicit access to any depth-semantic segmentation training pairs. Our model outperforms baselines based on pix2pix and CycleGAN models.

machine translation, neural network, translation, (17 more...)

arXiv.org Machine Learning

1804.02199

Country: Europe > Spain (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback