Goto

Collaborating Authors

 Overview


Diffusion Models in NLP: A Survey

arXiv.org Artificial Intelligence

Diffusion models have become a powerful family of deep generative models, with record-breaking performance in many applications. This paper first gives an overview and derivation of the basic theory of diffusion models, then reviews the research results of diffusion models in the field of natural language processing, from text generation, text-driven image generation and other four aspects, and analyzes and summarizes the relevant literature materials sorted out, and finally records the experience and feelings of this topic literature review research.


The Life Cycle of Knowledge in Big Language Models: A Survey

arXiv.org Artificial Intelligence

Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.


Meta-learning approaches for few-shot learning: A survey of recent advances

arXiv.org Artificial Intelligence

Humans possess the extraordinary capability of learning a new concept even after minimal observation. To a greater extent, a child can distinguish a dog from a cat through a single picture [1]. This critical characteristic of human intelligence lies in the humans' ability to leverage obtained knowledge of prior experiences to unforeseen circumstances with small observation. Unlike the human learning paradigm, traditional machine learning (ML) and deep learning (DL) models train a specific task from scratch through: (a) the training phase in which a model is initiated randomly and then updated, and (b) the test phase in which the model evaluates. While ML and DL have obtained remarkable success in a wide range of applications, they are notorious for requiring a huge number of samples to generalize. In many real-world problems, collecting more data is costly, time-consuming, and even might not feasible due to physical system constraints [2]. Moreover, most ML and DL models presume that training and testing datasets have the same distribution [3]. Thus, their performance suffers under data distribution shifts [4].


Constrained Adversarial Learning and its applicability to Automated Software Testing: a systematic review

arXiv.org Artificial Intelligence

Every novel technology adds hidden vulnerabilities ready to be exploited by a growing number of cyber-attacks. Automated software testing can be a promising solution to quickly analyze thousands of lines of code by generating and slightly modifying function-specific testing data to encounter a multitude of vulnerabilities and attack vectors. This process draws similarities to the constrained adversarial examples generated by adversarial learning methods, so there could be significant benefits to the integration of these methods in automated testing tools. Therefore, this systematic review is focused on the current state-of-the-art of constrained data generation methods applied for adversarial learning and software testing, aiming to guide researchers and developers to enhance testing tools with adversarial learning methods and improve the resilience and robustness of their digital systems. The found constrained data generation applications for adversarial machine learning were systematized, and the advantages and limitations of approaches specific for software testing were thoroughly analyzed, identifying research gaps and opportunities to improve testing tools with adversarial attack methods.


Domain Generalization in Machine Learning Models for Wireless Communications: Concepts, State-of-the-Art, and Open Issues

arXiv.org Artificial Intelligence

Data-driven machine learning (ML) is promoted as one potential technology to be used in next-generations wireless systems. This led to a large body of research work that applies ML techniques to solve problems in different layers of the wireless transmission link. However, most of these applications rely on supervised learning which assumes that the source (training) and target (test) data are independent and identically distributed (i.i.d). This assumption is often violated in the real world due to domain or distribution shifts between the source and the target data. Thus, it is important to ensure that these algorithms generalize to out-of-distribution (OOD) data. In this context, domain generalization (DG) tackles the OOD-related issues by learning models on different and distinct source domains/datasets with generalization capabilities to unseen new domains without additional finetuning. Motivated by the importance of DG requirements for wireless applications, we present a comprehensive overview of the recent developments in DG and the different sources of domain shift. We also summarize the existing DG methods and review their applications in selected wireless communication problems, and conclude with insights and open questions.


NASTyLinker: NIL-Aware Scalable Transformer-based Entity Linker

arXiv.org Artificial Intelligence

Entity Linking (EL) is the task of detecting mentions of entities in text and disambiguating them to a reference knowledge base. Most prevalent EL approaches assume that the reference knowledge base is complete. In practice, however, it is necessary to deal with the case of linking to an entity that is not contained in the knowledge base (NIL entity). Recent works have shown that, instead of focusing only on affinities between mentions and entities, considering inter-mention affinities can be used to represent NIL entities by producing clusters of mentions. At the same time, inter-mention affinities can help to substantially improve linking performance for known entities. With NASTyLinker, we introduce an EL approach that is aware of NIL entities and produces corresponding mention clusters while maintaining high linking performance for known entities. The approach clusters mentions and entities based on dense representations from Transformers and resolves conflicts (if more than one entity is assigned to a cluster) by computing transitive mention-entity affinities. We show the effectiveness and scalability of NASTyLinker on NILK, a dataset that is explicitly constructed to evaluate EL with respect to NIL entities. Further, we apply the presented approach to an actual EL task, namely to knowledge graph population by linking entities in Wikipedia listings, and provide an analysis of the outcome.


Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

arXiv.org Artificial Intelligence

Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality openended texts (so-called "neural texts"), one has to now consider Figure 1: The figure illustrates the quadrant of research problems authorships by humans, machines, or their combination. Due where (1) the GRAY quadrants are the focus of this survey, to the implications and potential threats of neural texts when and (2) The BLACK box indicates the specialized binary AA problem used maliciously, it has become critical to understand the limitations to distinguish neural texts from human texts. of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution released (e.g., FAIR [16, 82], CTRL [59], PPLM [25], T5 [94], Wu-and obfuscation of neural text authorship from a Data Dao


"Attention Is All You Need": USC Alumni Paved Path for ChatGPT - USC Viterbi

#artificialintelligence

Niki Parmar and Ashish Vaswani co-authored a seminal paper that set the groundwork for ChatGPT and other generative AI models. ChatGPT has taken the world by storm, but seeds of the groundbreaking technology were sown at the USC Viterbi School of Engineering. The seminal paper "Attention Is All You Need," which laid the foundation for ChatGPT and other generative AI systems, was co-authored by Ashish Vaswani, a PhD computer science graduate ('14) and Niki Parmar, a master's in computer science graduate ('15). The landmark paper was presented at the 2017 Conference on Neural Information Processing Systems (NeurIPS), one of the top conferences in AI and machine learning. In the paper, the researchers introduced the transformer architecture, a powerful type of neural network that has become widely used for natural language processing tasks, from text classification to language modeling.


Proactive Prioritization of App Issues via Contrastive Learning

arXiv.org Artificial Intelligence

Mobile app stores produce a tremendous amount of data in the form of user reviews, which is a huge source of user requirements and sentiments; such reviews allow app developers to proactively address issues in their apps. However, only a small number of reviews capture common issues and sentiments which creates a need for automatically identifying prominent reviews. Unfortunately, most existing work in text ranking and popularity prediction focuses on social contexts where other signals are available, which renders such works ineffective in the context of app reviews. In this work, we propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews (ones predicted to receive a large number of votes in a given time window). Predicting highly-voted reviews is challenging given that, unlike social posts, social network features of users are not available. Moreover, there is an issue of class imbalance, since a large number of user reviews receive little to no votes. PPrior employs a pre-trained T5 model and works in three phases. Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion. In phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews. Phase three uses radius neighbors classifier t o m ake t he final predictions. This phase also uses FAISS index for scalability and efficient search. To conduct extensive experiments, we acquired a large dataset of over 2.1 million user reviews from Google Play. Our experimental results demonstrate the effectiveness of the proposed framework when compared against several state-of-the-art approaches. Moreover, the accuracy of PPrior in predicting prominent reviews is comparable to that of experienced app developers.


Conceptual Modeling and Artificial Intelligence: A Systematic Mapping Study

arXiv.org Artificial Intelligence

In conceptual modeling (CM), humans apply abstraction to represent excerpts of reality for means of understanding and communication, and processing by machines. Artificial Intelligence (AI) is applied to vast amounts of data to automatically identify patterns or classify entities. While CM produces comprehensible and explicit knowledge representations, the outcome of AI algorithms often lacks these qualities while being able to extract knowledge from large and unstructured representations. Recently, a trend toward intertwining CM and AI emerged. This systematic mapping study shows how this interdisciplinary research field is structured, which mutual benefits are gained by the intertwining, and future research directions.