Goto

Collaborating Authors

 apa


Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Neural Information Processing Systems

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime. We provide a theoretical analysis to examine the convergence and rationality of our new training strategy. APA is simple and effective. It can be added seamlessly to powerful contemporary GANs, such as StyleGAN2, with negligible computational cost.




VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output

Chen, Eason, Lin, Chenyu, Tang, Xinyi, Xi, Aprille, Wang, Canwen, Lin, Jionghao, Koedinger, Kenneth R

arXiv.org Artificial Intelligence

The rapid evolution of large language models (LLMs) has transformed human-computer interaction (HCI), but the interaction with LLMs is currently mainly focused on text-based interactions, while other multi-model approaches remain under-explored. This paper introduces VTutor, an open-source Software Development Kit (SDK) that combines generative AI with advanced animation technologies to create engaging, adaptable, and realistic APAs for human-AI multi-media interactions. VTutor leverages LLMs for real-time personalized feedback, advanced lip synchronization for natural speech alignment, and WebGL rendering for seamless web integration. Supporting various 2D and 3D character models, VTutor enables researchers and developers to design emotionally resonant, contextually adaptive learning agents. This toolkit enhances learner engagement, feedback receptivity, and human-AI interaction while promoting trustworthy AI principles in education. VTutor sets a new standard for next-generation APAs, offering an accessible, scalable solution for fostering meaningful and immersive human-AI interaction experiences. The VTutor project is open-sourced and welcomes community-driven contributions and showcases.


Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Neural Information Processing Systems

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime.


Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment

Lo, Tien-Hong, Tsai, Meng-Ting, Chen, Berlin

arXiv.org Artificial Intelligence

Second language (L2) learners can improve their pronunciation by imitating golden speech, especially when the speech that aligns with their respective speech characteristics. This study explores the hypothesis that learner-specific golden speech generated with zero-shot text-to-speech (ZS-TTS) techniques can be harnessed as an effective metric for measuring the pronunciation proficiency of L2 learners. Building on this exploration, the contributions of this study are at least two-fold: 1) design and development of a systematic framework for assessing the ability of a synthesis model to generate golden speech, and 2) in-depth investigations of the effectiveness of using golden speech in automatic pronunciation assessment (APA). Comprehensive experiments conducted on the L2-ARCTIC and Speechocean762 benchmark datasets suggest that our proposed modeling can yield significant performance improvements with respect to various assessment metrics in relation to some prior arts. To our knowledge, this study is the first to explore the role of golden speech in both ZS-TTS and APA, offering a promising regime for computer-assisted pronunciation training (CAPT).


Decoupled Alignment for Robust Plug-and-Play Adaptation

Luo, Haozheng, Yu, Jiahao, Zhang, Wenxin, Li, Jialong, Hu, Jerry Yao-Chieh, Xing, Xinyu, Liu, Han

arXiv.org Artificial Intelligence

This innovation is practically urgent and important. LLMs have been widely adopted in various applications recently, demonstrating their ability to generate high-quality human-like texts [Team et al., 2024, Touvron et al., 2023, Ivison et al., 2023]. However, the security of these models has become a significant concern due to the potential risks of generating harmful content [Wu et al., 2024a, Yu et al., 2024, 2023a, Chao et al., 2023, Deng et al., 2023]. To align the LLMs with ethical guidelines, researchers have developed various methods to enhance their safety. For example, the Llama-2-Chat [Touvron et al., 2023] and Gemma-it [Team et al., 2024] models have been extensively fine-tuned to improve their alignment performance. However, these methods often require extensive computational resources or manual red-teaming, which can be costly and time-consuming [Team et al., 2024, OpenAI, 2024, Bai et al., 2022, Ganguli et al., 2022]. Thus, most of the LLMs finetuned from the pre-trained models by third-party developers do not undergo the alignment process [Xu et al., 2024a, Chiang et al., 2023, Ivison et al., 2023], leaving them vulnerable to generating harmful content by users with malicious intent. To combat these issues, we seek motivations from knowledge distillation technologies [Xu et al., 2024b, Hahn and Choi, 2019], where a teacher model's knowledge is transferred to a student model. Specifically, through numerical experiments Figure 3 and Figure 4, we make two key detections: MLP Alignment.


AI-Tutoring in Software Engineering Education

Frankford, Eduard, Sauerwein, Clemens, Bassner, Patrick, Krusche, Stephan, Breu, Ruth

arXiv.org Artificial Intelligence

With the rapid advancement of artificial intelligence (AI) in various domains, the education sector is set for transformation. The potential of AI-driven tools in enhancing the learning experience, especially in programming, is immense. However, the scientific evaluation of Large Language Models (LLMs) used in Automated Programming Assessment Systems (APASs) as an AI-Tutor remains largely unexplored. Therefore, there is a need to understand how students interact with such AI-Tutors and to analyze their experiences. In this paper, we conducted an exploratory case study by integrating the GPT-3.5-Turbo model as an AI-Tutor within the APAS Artemis. Through a combination of empirical data collection and an exploratory survey, we identified different user types based on their interaction patterns with the AI-Tutor. Additionally, the findings highlight advantages, such as timely feedback and scalability. However, challenges like generic responses and students' concerns about a learning progress inhibition when using the AI-Tutor were also evident. This research adds to the discourse on AI's role in education.


Enhancing Protein Predictive Models via Proteins Data Augmentation: A Benchmark and New Directions

Sun, Rui, Wu, Lirong, Lin, Haitao, Huang, Yufei, Li, Stan Z.

arXiv.org Artificial Intelligence

Augmentation is an effective alternative to utilize the small amount of labeled protein data. However, most of the existing work focuses on design-ing new architectures or pre-training tasks, and relatively little work has studied data augmentation for proteins. This paper extends data augmentation techniques previously used for images and texts to proteins and then benchmarks these techniques on a variety of protein-related tasks, providing the first comprehensive evaluation of protein augmentation. Furthermore, we propose two novel semantic-level protein augmentation methods, namely Integrated Gradients Substitution and Back Translation Substitution, which enable protein semantic-aware augmentation through saliency detection and biological knowledge. Finally, we integrate extended and proposed augmentations into an augmentation pool and propose a simple but effective framework, namely Automated Protein Augmentation (APA), which can adaptively select the most suitable augmentation combinations for different tasks. Extensive experiments have shown that APA enhances the performance of five protein related tasks by an average of 10.55% across three architectures compared to vanilla implementations without augmentation, highlighting its potential to make a great impact on the field.


A RAG-based Question Answering System Proposal for Understanding Islam: MufassirQAS LLM

Alan, Ahmet Yusuf, Karaarslan, Enis, Aydin, Ömer

arXiv.org Artificial Intelligence

Challenges exist in learning and understanding religions, such as the complexity and depth of religious doctrines and teachings. Chatbots as question-answering systems can help in solving these challenges. LLM chatbots use NLP techniques to establish connections between topics and accurately respond to complex questions. These capabilities make it perfect for enlightenment on religion as a question-answering chatbot. However, LLMs also tend to generate false information, known as hallucination. Also, the chatbots' responses can include content that insults personal religious beliefs, interfaith conflicts, and controversial or sensitive topics. It must avoid such cases without promoting hate speech or offending certain groups of people or their beliefs. This study uses a vector database-based Retrieval Augmented Generation (RAG) approach to enhance the accuracy and transparency of LLMs. Our question-answering system is called "MufassirQAS". We created a database consisting of several open-access books that include Turkish context. These books contain Turkish translations and interpretations of Islam. This database is utilized to answer religion-related questions and ensure our answers are trustworthy. The relevant part of the dataset, which LLM also uses, is presented along with the answer. We have put careful effort into creating system prompts that give instructions to prevent harmful, offensive, or disrespectful responses to respect people's values and provide reliable results. The system answers and shares additional information, such as the page number from the respective book and the articles referenced for obtaining the information. MufassirQAS and ChatGPT are also tested with sensitive questions. We got better performance with our system. Study and enhancements are still in progress. Results and future works are given.