Large Language Model
Zero-shot personalized lip-to-speech synthesis with face image based voice control
Sheng, Zheng-Yan, Ai, Yang, Ling, Zhen-Hua
Lip-to-Speech (Lip2Speech) synthesis, which predicts corresponding speech from talking face images, has witnessed significant progress with various models and training strategies in a series of independent studies. However, existing studies can not achieve voice control under zero-shot condition, because extra speaker embeddings need to be extracted from natural reference speech and are unavailable when only the silent video of an unseen speaker is given. In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities. A variational autoencoder is adopted to disentangle the speaker identity and linguistic content representations, which enables speaker embeddings to control the voice characteristics of synthetic speech for unseen speakers. Furthermore, we propose associated cross-modal representation learning to promote the ability of face-based speaker embeddings (FSE) on voice control. Extensive experiments verify the effectiveness of the proposed method whose synthetic utterances are more natural and matching with the personality of input video than the compared methods. To our best knowledge, this paper makes the first attempt on zero-shot personalized Lip2Speech synthesis with a face image rather than reference audio to control voice characteristics.
ChatGPT: Vision and Challenges
Gill, Sukhpal Singh, Kaur, Rupinder
The design made it possible to make powerful language models like term "Generative AI" is used to describe a subset of AI models OpenAI's GPT series, which included GPT-2 and GPT-3, that can generate new information by discovering relevant which were the versions that came before ChatGPT [6]. The trends and patterns in already collected information. These GPT-3.5 architecture is the basis for ChatGPT; it is an models may produce work in a wide range of media, from improved version of OpenAI's GPT-3 model. Even though written to visual to audio [2]. To analyse, comprehend, and GPT-3.5 has fewer variables, nevertheless produces excellent produce material that accurately imitates human-generated results in many areas of NLP, such as language understanding, outcomes, Generative AI models depend on deep learning text generation, and machine translation [6]. ChatGPT was approaches and neural networks. OpenAI's ChatGPT is one trained on a massive body of text data and fine-tuned on the such AI model that has quickly become a popular and versatile goal of creating conversational replies, allowing it to create resource for a number of different industries. Its humanoid text responses to user inquiries that are strangely similar to those of generation is made possible by its foundation in the Generative a person.
Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer
Large Language Models (LLMs), such as the Generative Pretrained Transformer (GPT), have achieved tremendous success in various language tasks, but their emergent abilities have also raised many questions, concerns, and challenges that need to be addressed. To gain a better understanding of the models' inner mechanisms, we analyze the hidden state and channel wave dynamics in a small GPT, focusing on the coherence of wave patterns in terms of cross-channel correlation and individual auto-correlation. Our findings suggest that wave dynamics offer consistent and repeatable intrinsic oscillation modes, along with context-aware plasticity and expressiveness in language generation. By analyzing wave patterns, coherence, and clustering, we provide a systematic way to identify and interpret the functionality of the hidden state channels, paving the way to understand and control higher-level language pattern formation. In addition, we investigate the Poisson statistics of spelling errors in text sequence generation across various levels of model training and observe a phase-transition-like process. As coherence builds up, there is a competition between the generation of correct and misspelled words. However, once the model is adequately trained and significant coherence has emerged, the coherent process becomes strong enough to effectively suppress spelling errors, preventing the cascade amplification of defects. The distribution of correct spellings transitions from Poissonian to Sub-Poissonian, while the distribution of misspellings shows the opposite trend. By leveraging concepts and techniques from quantum physics, we gain novel insights into the dynamics of the small GPT. This approach can be extended to larger language models that exhibit more complex coherent language patterns, opening up opportunities to interpret their emergent capabilities and develop more specialized models.
Read, Diagnose and Chat: Towards Explainable and Interactive LLMs-Augmented Depression Detection in Social Media
Qin, Wei, Chen, Zetong, Wang, Lei, Lan, Yunshi, Ren, Weijieying, Hong, Richang
More than half of those who suffer from depression do not receive any Depression detection based on social media content has received increasing treatment [22]. Williams et al. [41] used a depression diagnostic attention, as it allows for early diagnosis before the user's test instrument to discover that 8% of the population had symptoms psychological state deteriorates. Although traditional methods of and a diagnosis of depression, while 7.6% had symptoms but had depression detection can provide a classification of whether the not been diagnosed. Individuals affected by mental disorders often user is depressed or not, they cannot provide human-like explanations hesitate to seek professional help [11].
Differentially Private Attention Computation
Gao, Yeqi, Song, Zhao, Yang, Xin
Large language models (LLMs) have had a profound impact on numerous aspects of daily life including natural language processing, content generation, research methodologies and so on. However, one crucial issue concerning the inference results of large language models is security and privacy. In many scenarios, the results generated by LLMs could possibly leak many confidential or copyright information. A recent beautiful and breakthrough work [Vyas, Kakade and Barak 2023] focus on such privacy issue of the LLMs from theoretical perspective. It is well-known that computing the attention matrix is one of the major task during the LLMs computation. Thus, how to give a provable privately guarantees of computing the attention matrix is an important research direction. Previous work [Alman and Song 2023, Brand, Song and Zhou 2023] have proposed provable tight result for fast computation of attention without considering privacy concerns. One natural mathematical formulation to quantity the privacy in theoretical computer science graduate school textbook is differential privacy. Inspired by [Vyas, Kakade and Barak 2023], in this work, we provide a provable result for showing how to differentially private approximate the attention matrix. From technique perspective, our result replies on a pioneering work in the area of differential privacy by [Alabi, Kothari, Tankala, Venkat and Zhang 2022].
Enhancing Knowledge Graph Construction Using Large Language Models
Trajanoska, Milena, Stojanov, Riste, Trajanov, Dimitar
The growing trend of Large Language Models (LLM) development has attracted significant attention, with models for various applications emerging consistently. However, the combined application of Large Language Models with semantic technologies for reasoning and inference is still a challenging task. This paper analyzes how the current advances in foundational LLM, like ChatGPT, can be compared with the specialized pretrained models, like REBEL, for joint entity and relation extraction. To evaluate this approach, we conducted several experiments using sustainability-related text as our use case. We created pipelines for the automatic creation of Knowledge Graphs from raw texts, and our findings indicate that using advanced LLM models can improve the accuracy of the process of creating these graphs from unstructured text. Furthermore, we explored the potential of automatic ontology creation using foundation LLM models, which resulted in even more relevant and accurate knowledge graphs.
Code Execution with Pre-trained Language Models
Liu, Chenxiao, Lu, Shuai, Chen, Weizhu, Jiang, Daxin, Svyatkovskiy, Alexey, Fu, Shengyu, Sundaresan, Neel, Duan, Nan
Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, we investigate how well pre-trained models can understand and perform code execution. We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution, which challenges existing models such as Codex. We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension. We evaluate CodeExecutor on code execution and show its promising performance and limitations. We also demonstrate its potential benefits for code intelligence tasks such as zero-shot code-to-code search and text-to-code generation. Our analysis provides insights into the learning and generalization abilities of pre-trained models for code execution.
ComputeGPT: A computational chat model for numerical problems
Lewis, Ryan Hardesty, Jiao, Junfeng
Language models have made significant strides in recent years, becoming proficient at understanding and generating human-like text [26, 2]. However, despite their advances, traditional language models remain inaccurate in solving numerical problems, as their architecture relies on predicting the next word based on probability rather than executing calculations [3]. This paper introduces ComputeGPT, an innovative chat model capable of addressing computational problems by running on-demand code. ComputeGPT parses each question into relevant code, executes the code, and returns the computed answer as part of the chat. We combine this approach with a local browserbased Python interpreter, Pyiodide, and fine-tuned prompts to achieve state-of-the-art efficiency in solving numerical problems while providing a suitable and safe environment for code execution.
Generating Phishing Attacks using ChatGPT
Roy, Sayak Saha, Naragam, Krishna Vamsi, Nilizadeh, Shirin
The ability of ChatGPT to generate human-like responses and understand context has made it a popular tool for conversational agents, content creation, data analysis, and research and innovation. However, its effectiveness and ease of accessibility makes it a prime target for generating malicious content, such as phishing attacks, that can put users at risk. In this work, we identify several malicious prompts that can be provided to ChatGPT to generate functional phishing websites. Through an iterative approach, we find that these phishing websites can be made to imitate popular brands and emulate several evasive tactics that have been known to avoid detection by anti-phishing entities. These attacks can be generated using vanilla ChatGPT without the need of any prior adversarial exploits (jailbreaking).
Algebra Error Classification with Large Language Models
McNichols, Hunter, Zhang, Mengxue, Lan, Andrew
Automated feedback as students answer open-ended math questions has significant potential in improving learning outcomes at large scale. A key part of automated feedback systems is an error classification component, which identifies student errors and enables appropriate, predefined feedback to be deployed. Most existing approaches to error classification use a rule-based method, which has limited capacity to generalize. Existing data-driven methods avoid these limitations but specifically require mathematical expressions in student responses to be parsed into syntax trees. This requirement is itself a limitation, since student responses are not always syntactically valid and cannot be converted into trees. In this work, we introduce a flexible method for error classification using pre-trained large language models. We demonstrate that our method can outperform existing methods in algebra error classification, and is able to classify a larger set of student responses. Additionally, we analyze common classification errors made by our method and discuss limitations of automated error classification.