Overview
Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model
Deng, Zhijie, Gao, Hongcheng, Miao, Yibo, Zhang, Hao
The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse. Some methods train dedicated detectors on specific datasets but fall short in generalizing to unseen test data, while other zero-shot ones often yield suboptimal performance. Although the recent DetectGPT has shown promising detection performance, it suffers from significant inefficiency issues, as detecting a single candidate requires scoring hundreds of its perturbations with the source LLM. This paper aims to bridge this gap. Technically, we propose to incorporate a Bayesian surrogate model, which allows us to select typical samples based on Bayesian uncertainty and interpolate scores from typical samples to other ones, to improve query efficiency. Our empirical results demonstrate that our method significantly outperforms existing approaches under a low query budget. Notably, our method achieves similar performance with up to 2 times fewer queries than DetectGPT and 3.7% higher AUROC at a query number of 5.
Towards Reasoning in Large Language Models: A Survey
Huang, Jie, Chang, Kevin Chen-Chuan
Reasoning is a fundamental aspect of human intelligence that plays a crucial role in activities such as problem solving, decision making, and critical thinking. In recent years, large language models (LLMs) have made significant progress in natural language processing, and there is observation that these models may exhibit reasoning abilities when they are sufficiently large. However, it is not yet clear to what extent LLMs are capable of reasoning. This paper provides a comprehensive overview of the current state of knowledge on reasoning in LLMs, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, findings and implications of previous research in this field, and suggestions on future directions. Our aim is to provide a detailed and up-to-date review of this topic and stimulate meaningful discussion and future work.
A New Aligned Simple German Corpus
Toborek, Vanessa, Busch, Moritz, Boßert, Malte, Bauckhage, Christian, Welke, Pascal
"Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.
CyPhERS: A Cyber-Physical Event Reasoning System providing real-time situational awareness for attack and fault response
Müller, Nils, Bao, Kaibin, Matthes, Jörg, Heussen, Kai
Cyber-physical systems (CPSs) constitute the backbone of critical infrastructures such as power grids or water distribution networks. Operating failures in these systems can cause serious risks for society. To avoid or minimize downtime, operators require real-time awareness about critical incidents. However, online event identification in CPSs is challenged by the complex interdependency of numerous physical and digital components, requiring to take cyber attacks and physical failures equally into account. The online event identification problem is further complicated through the lack of historical observations of critical but rare events, and the continuous evolution of cyber attack strategies. This work introduces and demonstrates CyPhERS, a Cyber-Physical Event Reasoning System. CyPhERS provides real-time information pertaining the occurrence, location, physical impact, and root cause of potentially critical events in CPSs, without the need for historical event observations. Key novelty of CyPhERS is the capability to generate informative and interpretable event signatures of known and unknown types of both cyber attacks and physical failures. The concept is evaluated and benchmarked on a demonstration case that comprises a multitude of attack and fault events targeting various components of a CPS. The results demonstrate that the event signatures provide relevant and inferable information on both known and unknown event types.
Pruning Distorted Images in MNIST Handwritten Digits
Recognizing handwritten digits is a challenging task primarily due to the diversity of writing styles and the presence of noisy images. The widely used MNIST dataset, which is commonly employed as a benchmark for this task, includes distorted digits with irregular shapes, incomplete strokes, and varying skew in both the training and testing datasets. Consequently, these factors contribute to reduced accuracy in digit recognition. To overcome this challenge, we propose a two-stage deep learning approach. In the first stage, we create a simple neural network to identify distorted digits within the training set. This model serves to detect and filter out such distorted and ambiguous images. In the second stage, we exclude these identified images from the training dataset and proceed to retrain the model using the filtered dataset. This process aims to improve the classification accuracy and confidence levels while mitigating issues of underfitting and overfitting. Our experimental results demonstrate the effectiveness of the proposed approach, achieving an accuracy rate of over 99.5% on the testing dataset. In our future work, we intend to explore the scalability of this approach and investigate techniques to further enhance accuracy by reducing the size of the training data. NTRODUCTION Handwritten digit recognition is a complex task that finds applications in various fields, including computer vision and machine learning. It involves the identification and classification of digits written by hand, enabling tasks such as character recognition and digit analysis.
GC-Flow: A Graph-Based Flow Network for Effective Clustering
Wang, Tianchun, Mirzazadeh, Farzaneh, Zhang, Xiang, Chen, Jie
Graph convolutional networks (GCNs) are \emph{discriminative models} that directly model the class posterior $p(y|\mathbf{x})$ for semi-supervised classification of graph data. While being effective, as a representation learning approach, the node representations extracted from a GCN often miss useful information for effective clustering, because the objectives are different. In this work, we design normalizing flows that replace GCN layers, leading to a \emph{generative model} that models both the class conditional likelihood $p(\mathbf{x}|y)$ and the class prior $p(y)$. The resulting neural network, GC-Flow, retains the graph convolution operations while being equipped with a Gaussian mixture representation space. It enjoys two benefits: it not only maintains the predictive power of GCN, but also produces well-separated clusters, due to the structuring of the representation space. We demonstrate these benefits on a variety of benchmark data sets. Moreover, we show that additional parameterization, such as that on the adjacency matrix used for graph convolutions, yields additional improvement in clustering.
Coping with low data availability for social media crisis message categorisation
During crisis situations, social media allows people to quickly share information, including messages requesting help. This can be valuable to emergency responders, who need to categorise and prioritise these messages based on the type of assistance being requested. However, the high volume of messages makes it difficult to filter and prioritise them without the use of computational techniques. Fully supervised filtering techniques for crisis message categorisation typically require a large amount of annotated training data, but this can be difficult to obtain during an ongoing crisis and is expensive in terms of time and labour to create. This thesis focuses on addressing the challenge of low data availability when categorising crisis messages for emergency response. It first presents domain adaptation as a solution for this problem, which involves learning a categorisation model from annotated data from past crisis events (source domain) and adapting it to categorise messages from an ongoing crisis event (target domain). In many-to-many adaptation, where the model is trained on multiple past events and adapted to multiple ongoing events, a multi-task learning approach is proposed using pre-trained language models. This approach outperforms baselines and an ensemble approach further improves performance...
DP-SGD Without Clipping: The Lipschitz Neural Network Way
Bethune, Louis, Massena, Thomas, Boissin, Thibaut, Prudent, Yannick, Friedrich, Corentin, Mamalet, Franck, Bellet, Aurelien, Serrurier, Mathieu, Vigouroux, David
State-of-the-art approaches for training Differentially Private (DP) Deep Neural Networks (DNN) faces difficulties to estimate tight bounds on the sensitivity of the network's layers, and instead rely on a process of per-sample gradient clipping. This clipping process not only biases the direction of gradients but also proves costly both in memory consumption and in computation. To provide sensitivity bounds and bypass the drawbacks of the clipping process, our theoretical analysis of Lipschitz constrained networks reveals an unexplored link between the Lipschitz constant with respect to their input and the one with respect to their parameters. By bounding the Lipschitz constant of each layer with respect to its parameters we guarantee DP training of these networks. This analysis not only allows the computation of the aforementioned sensitivities at scale but also provides leads on to how maximize the gradient-to-noise ratio for fixed privacy guarantees. To facilitate the application of Lipschitz networks and foster robust and certifiable learning under privacy guarantees, we provide a Python package that implements building blocks allowing the construction and private training of such networks.
Explainability Techniques for Chemical Language Models
Hödl, Stefan, Robinson, William, Bachrach, Yoram, Huck, Wilhelm, Kachman, Tal
Explainability techniques are crucial in gaining insights into the reasons behind the predictions of deep learning models, which have not yet been applied to chemical language models. We propose an explainable AI technique that attributes the importance of individual atoms towards the predictions made by these models. Our method backpropagates the relevance information towards the chemical input string and visualizes the importance of individual atoms. We focus on self-attention Transformers operating on molecular string representations and leverage a pretrained encoder for finetuning. We showcase the method by predicting and visualizing solubility in water and organic solvents. We achieve competitive model performance while obtaining interpretable predictions, which we use to inspect the pretrained model.
AI Techniques in the Microservices Life-Cycle: A Survey
Moreschini, Sergio, Pour, Shahrzad, Lanese, Ivan, Balouek-Thomert, Daniel, Bogner, Justus, Li, Xiaozhou, Pecorelli, Fabiano, Soldani, Jacopo, Truyen, Eddy, Taibi, Davide
Microservices is a popular architectural style for the development of distributed software, with an emphasis on modularity, scalability, and flexibility. Indeed, in microservice systems, functionalities are provided by loosely coupled, small services, each focusing on a specific business capability. Building a system according to the microservices architectural style brings a number of challenges, mainly related to how the different microservices are deployed and coordinated and how they interact. In this paper, we provide a survey about how techniques in the area of Artificial Intelligence have been used to tackle these challenges.