Goto

Collaborating Authors

 apa




Multi-agentactiveperceptionwithpredictionrewards

Neural Information Processing Systems

Active perception,collecting observations to reduce uncertainty about ahidden variable, isone of the fundamental capabilities of an intelligent agent [2]. In multi-agent active perceptiona team of autonomous agents cooperatively gathers observations to infer the value of a hidden variable.


Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Neural Information Processing Systems

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime. We provide a theoretical analysis to examine the convergence and rationality of our new training strategy. APA is simple and effective. It can be added seamlessly to powerful contemporary GANs, such as StyleGAN2, with negligible computational cost.




VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output

Chen, Eason, Lin, Chenyu, Tang, Xinyi, Xi, Aprille, Wang, Canwen, Lin, Jionghao, Koedinger, Kenneth R

arXiv.org Artificial Intelligence

The rapid evolution of large language models (LLMs) has transformed human-computer interaction (HCI), but the interaction with LLMs is currently mainly focused on text-based interactions, while other multi-model approaches remain under-explored. This paper introduces VTutor, an open-source Software Development Kit (SDK) that combines generative AI with advanced animation technologies to create engaging, adaptable, and realistic APAs for human-AI multi-media interactions. VTutor leverages LLMs for real-time personalized feedback, advanced lip synchronization for natural speech alignment, and WebGL rendering for seamless web integration. Supporting various 2D and 3D character models, VTutor enables researchers and developers to design emotionally resonant, contextually adaptive learning agents. This toolkit enhances learner engagement, feedback receptivity, and human-AI interaction while promoting trustworthy AI principles in education. VTutor sets a new standard for next-generation APAs, offering an accessible, scalable solution for fostering meaningful and immersive human-AI interaction experiences. The VTutor project is open-sourced and welcomes community-driven contributions and showcases.


Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

Neural Information Processing Systems

Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime.


Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment

Lo, Tien-Hong, Tsai, Meng-Ting, Chen, Berlin

arXiv.org Artificial Intelligence

Second language (L2) learners can improve their pronunciation by imitating golden speech, especially when the speech that aligns with their respective speech characteristics. This study explores the hypothesis that learner-specific golden speech generated with zero-shot text-to-speech (ZS-TTS) techniques can be harnessed as an effective metric for measuring the pronunciation proficiency of L2 learners. Building on this exploration, the contributions of this study are at least two-fold: 1) design and development of a systematic framework for assessing the ability of a synthesis model to generate golden speech, and 2) in-depth investigations of the effectiveness of using golden speech in automatic pronunciation assessment (APA). Comprehensive experiments conducted on the L2-ARCTIC and Speechocean762 benchmark datasets suggest that our proposed modeling can yield significant performance improvements with respect to various assessment metrics in relation to some prior arts. To our knowledge, this study is the first to explore the role of golden speech in both ZS-TTS and APA, offering a promising regime for computer-assisted pronunciation training (CAPT).


Decoupled Alignment for Robust Plug-and-Play Adaptation

Luo, Haozheng, Yu, Jiahao, Zhang, Wenxin, Li, Jialong, Hu, Jerry Yao-Chieh, Xing, Xinyu, Liu, Han

arXiv.org Artificial Intelligence

This innovation is practically urgent and important. LLMs have been widely adopted in various applications recently, demonstrating their ability to generate high-quality human-like texts [Team et al., 2024, Touvron et al., 2023, Ivison et al., 2023]. However, the security of these models has become a significant concern due to the potential risks of generating harmful content [Wu et al., 2024a, Yu et al., 2024, 2023a, Chao et al., 2023, Deng et al., 2023]. To align the LLMs with ethical guidelines, researchers have developed various methods to enhance their safety. For example, the Llama-2-Chat [Touvron et al., 2023] and Gemma-it [Team et al., 2024] models have been extensively fine-tuned to improve their alignment performance. However, these methods often require extensive computational resources or manual red-teaming, which can be costly and time-consuming [Team et al., 2024, OpenAI, 2024, Bai et al., 2022, Ganguli et al., 2022]. Thus, most of the LLMs finetuned from the pre-trained models by third-party developers do not undergo the alignment process [Xu et al., 2024a, Chiang et al., 2023, Ivison et al., 2023], leaving them vulnerable to generating harmful content by users with malicious intent. To combat these issues, we seek motivations from knowledge distillation technologies [Xu et al., 2024b, Hahn and Choi, 2019], where a teacher model's knowledge is transferred to a student model. Specifically, through numerical experiments Figure 3 and Figure 4, we make two key detections: MLP Alignment.