Wang, Ye
Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
Wang, Xintong, Shi, Mingqian, Wang, Ye
Subsequently, Zhang et al. [1] adopted Mispronunciation Detection and Diagnosis (MDD) systems, an autoregressive model, the Recurrent Neural Network Transducer leveraging Automatic Speech Recognition (ASR), face two (RNN-T) [9], for MDD. This approach aims to capture main challenges in Mandarin Chinese: 1) The two-stage models the temporal dependence of mispronunciation patterns, showing create an information gap between the phoneme or tone classification better performance than Connectionist Temporal Classification stage and the MDD stage.
ACEGEN: Reinforcement learning of generative chemical agents for drug discovery
Bou, Albert, Thomas, Morgan, Dittert, Sebastian, Ramรญrez, Carles Navarro, Majewski, Maciej, Wang, Ye, Patel, Shivam, Tresadern, Gary, Ahmad, Mazen, Moens, Vincent, Sherman, Woody, Sciabola, Simone, De Fabritiis, Gianni
In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capabilities, flexibility, reliability, and efficiency remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEGEN, a comprehensive and streamlined toolkit tailored for generative drug design, built using TorchRL, a modern RL library that offers thoroughly tested reusable components. We validate ACEGEN by benchmarking against other published generative modeling algorithms and show comparable or improved performance. We also show examples of ACEGEN applied in multiple drug discovery case studies. ACEGEN is accessible at \url{https://github.com/acellera/acegen-open} and available for use under the MIT license.
A Unified Temporal Knowledge Graph Reasoning Model Towards Interpolation and Extrapolation
Chen, Kai, Wang, Ye, Li, Yitong, Li, Aiping, Yu, Han, Song, Xin
Temporal knowledge graph (TKG) reasoning has two settings: interpolation reasoning and extrapolation reasoning. Both of them draw plenty of research interest and have great significance. Methods of the former de-emphasize the temporal correlations among facts sequences, while methods of the latter require strict chronological order of knowledge and ignore inferring clues provided by missing facts of the past. These limit the practicability of TKG applications as almost all of the existing TKG reasoning methods are designed specifically to address either one setting. To this end, this paper proposes an original Temporal PAth-based Reasoning (TPAR) model for both the interpolation and extrapolation reasoning. TPAR performs a neural-driven symbolic reasoning fashion that is robust to ambiguous and noisy temporal data and with fine interpretability as well. Comprehensive experiments show that TPAR outperforms SOTA methods on the link prediction task for both the interpolation and the extrapolation settings. A novel pipeline experimental setting is designed to evaluate the performances of SOTA combinations and the proposed TPAR towards interpolation and extrapolation reasoning. More diverse experiments are conducted to show the robustness and interpretability of TPAR.
End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
Piano audio-to-score transcription (A2S) is an important yet underexplored task with extensive applications for music composition, practice, and analysis. However, existing end-to-end piano A2S systems faced difficulties in retrieving bar-level information such as key and time signatures, and have been trained and evaluated with only synthetic data. To address these limitations, we propose a sequence-to-sequence (Seq2Seq) model with a hierarchical decoder that aligns with the hierarchical structure of musical scores, enabling the transcription of score information at both the bar and note levels by multi-task learning. To bridge the gap between synthetic data and recordings of human performance, we propose a two-stage training scheme, which involves pre-training the model using an expressive performance rendering (EPR) system on synthetic audio, followed by fine-tuning the model using recordings of human performance. To preserve the voicing structure for score reconstruction, we propose a pre-processing method for **Kern scores in scenarios with an unconstrained number of voices. Experimental results support the effectiveness of our proposed approaches, in terms of both transcription performance on synthetic audio data in comparison to the current state-of-the-art, and the first experiment on human recordings.
Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery
Wang, Ye, Wang, Yaxiong, Wu, Yujiao, Zhao, Bingchen, Qian, Xueming
Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset and 12x clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at https://github.com/xjtuYW/PNP.git.
Prompt-tuning for Clickbait Detection via Text Summarization
Deng, Haoxiang, Zhu, Yi, Wang, Ye, Qiang, Jipeng, Yuan, Yunhao, Li, Yun, Zhang, Runmei
Clickbaits are surprising social posts or deceptive news headlines that attempt to lure users for more clicks, which have posted at unprecedented rates for more profit or commercial revenue. The spread of clickbait has significant negative impacts on the users, which brings users misleading or even click-jacking attacks. Different from fake news, the crucial problem in clickbait detection is determining whether the headline matches the corresponding content. Most existing methods compute the semantic similarity between the headlines and contents for detecting clickbait. However, due to significant differences in length and semantic features between headlines and contents, directly calculating semantic similarity is often difficult to summarize the relationship between them. To address this problem, we propose a prompt-tuning method for clickbait detection via text summarization in this paper, text summarization is introduced to summarize the contents, and clickbait detection is performed based on the similarity between the generated summary and the contents. Specifically, we first introduce a two-stage text summarization model to produce high-quality news summaries based on pre-trained language models, and then both the headlines and new generated summaries are incorporated as the inputs for prompt-tuning. Additionally, a variety of strategies are conducted to incorporate external knowledge for improving the performance of clickbait detection. The extensive experiments on well-known clickbait detection datasets demonstrate that our method achieved state-of-the-art performance.
Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation
Ataei, Mohammadmehdi, Cheong, Hyunmin, Grandi, Daniele, Wang, Ye, Morris, Nigel, Tessier, Alexander
Requirements elicitation, a critical, yet time-consuming and challenging step in product development, often fails to capture the full spectrum of user needs. This may lead to products that fall short of expectations. This paper introduces a novel framework that leverages Large Language Models (LLMs) to automate and enhance the requirements elicitation process. LLMs are used to generate a vast array of simulated users (LLM agents), enabling the exploration of a much broader range of user needs and unforeseen use cases. These agents engage in product experience scenarios, through explaining their actions, observations, and challenges. Subsequent agent interviews and analysis uncover valuable user needs, including latent ones. We validate our framework with three experiments. First, we explore different methodologies for diverse agent generation, discussing their advantages and shortcomings. We measure the diversity of identified user needs and demonstrate that context-aware agent generation leads to greater diversity. Second, we show how our framework effectively mimics empathic lead user interviews, identifying a greater number of latent needs than conventional human interviews. Third, we showcase that LLMs can be used to analyze interviews, capture needs, and classify them as latent or not. Our work highlights the potential of using LLM agents to accelerate early-stage product development, reduce costs, and increase innovation.
SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules
Chen, Xiangyu, Liu, Jing, Wang, Ye, Wang, Pu Perry, Brand, Matthew, Wang, Guanghui, Koike-Akino, Toshiaki
Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.
AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs
Ahmed, Md Rubel, Koike-Akino, Toshiaki, Parsons, Kieran, Wang, Ye
High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design space parameters can take much time and effort for hardware engineers to meet specific design specifications. This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization. Our tool focuses on HLS pragma exploration and operation transformation. It utilizes integrated DNNs to predict synthesizability within a given FPGA resource budget. We also investigate the potential of emerging quantum neural networks (QNNs) instead of classical DNNs for the AutoHLS pipeline. Our experimental results demonstrate up to a 70-fold speedup in exploration time.
DA-PFL: Dynamic Affinity Aggregation for Personalized Federated Learning
Yang, Xu, Feng, Jiyuan, Guo, Songyue, Wang, Ye, Ding, Ye, Fang, Binxing, Liao, Qing
Personalized federated learning becomes a hot research topic that can learn a personalized learning model for each client. Existing personalized federated learning models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similaritybased personalized federated learning methods may exacerbate the class imbalanced problem. In this paper, we propose a novel Dynamic Affinity-based Personalized Federated Learning model (DA-PFL) to alleviate the class imbalanced problem during federated learning. Specifically, we build an affinity metric from a complementary perspective to guide which clients should be aggregated. Then we design a dynamic aggregation strategy to dynamically aggregate clients based on the affinity metric in each round to reduce the class imbalanced risk. Extensive experiments show that the proposed DA-PFL model can significantly improve the accuracy of each client in three real-world datasets with state-of-the-art comparison methods.