Goto

Collaborating Authors

 Accuracy


Transferable Cross-Tokamak Disruption Prediction with Deep Hybrid Neural Network Feature Extractor

arXiv.org Artificial Intelligence

Predicting disruptions across different tokamaks is a great obstacle to overcome. Future tokamaks can hardly tolerate disruptions at high performance discharge. Few disruption discharges at high performance can hardly compose an abundant training set, which makes it difficult for current data-driven methods to obtain an acceptable result. A machine learning method capable of transferring a disruption prediction model trained on one tokamak to another is required to solve the problem. The key is a disruption prediction model containing a feature extractor that is able to extract common disruption precursor traces in tokamak diagnostic data, and a transferable disruption classifier. Based on the concerns above, the paper first presents a deep fusion feature extractor designed specifically for extracting disruption precursor features from common diagnostics on tokamaks according to currently known precursors of disruption, providing a promising foundation for transferable models. The fusion feature extractor is proved by comparing with manual feature extraction on J-TEXT. Based on the feature extractor trained on J-TEXT, the disruption prediction model was transferred to EAST data with mere 20 discharges from EAST experiment. The performance is comparable with a model trained with 1896 discharges from EAST. From the comparison among other model training scenarios, transfer learning showed its potential in predicting disruptions across different tokamaks.


Transferring Chemical and Energetic Knowledge Between Molecular Systems with Machine Learning

arXiv.org Artificial Intelligence

Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has use cases in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms has impacted on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, possessing a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy states. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing the potential energy of a conformation, and (ii) novel message passing and pooling layers for processing and making predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the very same transfer learning approach can be used to group, in an unsupervised way, various secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.


Dispersed Pixel Perturbation-based Imperceptible Backdoor Trigger for Image Classifier Models

arXiv.org Artificial Intelligence

Typical deep neural network (DNN) backdoor attacks are based on triggers embedded in inputs. Existing imperceptible triggers are computationally expensive or low in attack success. In this paper, we propose a new backdoor trigger, which is easy to generate, imperceptible, and highly effective. The new trigger is a uniformly randomly generated three-dimensional (3D) binary pattern that can be horizontally and/or vertically repeated and mirrored and superposed onto three-channel images for training a backdoored DNN model. Dispersed throughout an image, the new trigger produces weak perturbation to individual pixels, but collectively holds a strong recognizable pattern to train and activate the backdoor of the DNN. We also analytically reveal that the trigger is increasingly effective with the improving resolution of the images. Experiments are conducted using the ResNet-18 and MLP models on the MNIST, CIFAR-10, and BTSR datasets. In terms of imperceptibility, the new trigger outperforms existing triggers, such as BadNets, Trojaned NN, and Hidden Backdoor, by over an order of magnitude. The new trigger achieves an almost 100% attack success rate, only reduces the classification accuracy by less than 0.7%-2.4%, and invalidates the state-of-the-art defense techniques.


OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set Unlabeled Data

arXiv.org Artificial Intelligence

Semi-supervised learning (SSL) has been a powerful strategy to incorporate few labels in learning better representations. In this paper, we focus on a practical scenario that one aims to apply SSL when unlabeled data may contain out-of-class samples - those that cannot have one-hot encoded labels from a closed-set of classes in label data, i.e., the unlabeled data is an open-set. Specifically, we introduce OpenCoS, a simple framework for handling this realistic semi-supervised learning scenario based upon a recent framework of self-supervised visual representation learning. We first observe that the out-of-class samples in the open-set unlabeled dataset can be identified effectively via self-supervised contrastive learning. Then, OpenCoS utilizes this information to overcome the failure modes in the existing state-of-the-art semi-supervised methods, by utilizing one-hot pseudo-labels and soft-labels for the identified in- and out-of-class unlabeled data, respectively. Our extensive experimental results show the effectiveness of OpenCoS under the presence of out-of-class samples, fixing up the state-of-the-art semi-supervised methods to be suitable for diverse scenarios involving open-set unlabeled data.


WeShort: Out-of-distribution Detection With Weak Shortcut structure

arXiv.org Artificial Intelligence

Neural networks have achieved impressive performance for data in the distribution which is the same as the training set but can produce an overconfident incorrect result for the data these networks have never seen. Therefore, it is essential to detect whether inputs come from out-of-distribution(OOD) in order to guarantee the safety of neural networks deployed in the real world. In this paper, we propose a simple and effective post-hoc technique, WeShort, to reduce the overconfidence of neural networks on OOD data. Our method is inspired by the observation of the internal residual structure, which shows the separation of the OOD and in-distribution (ID) data in the shortcut layer. Our method is compatible with different OOD detection scores and can generalize well to different architectures of networks. We demonstrate our method on various OOD datasets to show its competitive performances and provide reasonable hypotheses to explain why our method works. On the ImageNet benchmark, Weshort achieves state-of-the-art performance on the false positive rate (FPR95) and the area under the receiver operating characteristic (AUROC) on the family of post-hoc methods.


Classification Performance Metric Elicitation and its Applications

arXiv.org Artificial Intelligence

Given a learning problem with real-world tradeoffs, which cost function should the model be trained to optimize? This is the metric selection problem in machine learning. Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications. This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences. Once specified, the evaluation metric can be used to compare and train models. In this manuscript, we formalize the problem of Metric Elicitation and devise novel strategies for eliciting classification performance metrics using pairwise preference feedback over classifiers. Specifically, we provide novel strategies for eliciting linear and linear-fractional metrics for binary and multiclass classification problems, which are then extended to a framework that elicits group-fair performance metrics in the presence of multiple sensitive groups. All the elicitation strategies that we discuss are robust to both finite sample and feedback noise, thus are useful in practice for real-world applications. Using the tools and the geometric characterizations of the feasible confusion statistics sets from the binary, multiclass, and multiclass-multigroup classification setups, we further provide strategies to elicit from a wider range of complex, modern multiclass metrics defined by quadratic functions of confusion statistics by exploiting their local linear structure. From application perspective, we also propose to use the metric elicitation framework in optimizing complex black box metrics that is amenable to deep network training. Lastly, to bring theory closer to practice, we conduct a preliminary real-user study that shows the efficacy of the metric elicitation framework in recovering the users' preferred performance metric in a binary classification setup.


Open Information Extraction from 2007 to 2022 -- A Survey

arXiv.org Artificial Intelligence

Open information extraction is an important NLP task that targets extracting structured information from unstructured text without limitations on the relation type or the domain of the text. This survey paper covers open information extraction technologies from 2007 to 2022 with a focus on new models not covered by previous surveys. We propose a new categorization method from the source of information perspective to accommodate the development of recent OIE technologies. In addition, we summarize three major approaches based on task settings as well as current popular datasets and model evaluation metrics. Given the comprehensive review, several future directions are shown from datasets, source of information, output form, method, and evaluation metric aspects.


In Silico Prediction of Blood-Brain Barrier Permeability of Chemical Compounds through Molecular Feature Modeling

arXiv.org Artificial Intelligence

The introduction of computational techniques to analyze chemical data has given rise to the analytical study of biological systems, known as "bioinformatics". One facet of bioinformatics is using machine learning (ML) technology to detect multivariable trends in various cases. Amongst the most pressing cases is predicting blood-brain barrier (BBB) permeability. The development of new drugs to treat central nervous system disorders presents unique challenges due to poor penetration efficacy across the blood-brain barrier. In this research, we aim to mitigate this problem through an ML model that analyzes chemical features. To do so: (i) An overview into the relevant biological systems and processes as well as the use case is given. (ii) Second, an in-depth literature review of existing computational techniques for detecting BBB permeability is undertaken. From there, an aspect unexplored across current techniques is identified and a solution is proposed. (iii) Lastly, a two-part in silico model to quantify likelihood of permeability of drugs with defined features across the BBB through passive diffusion is developed, tested, and reflected on. Testing and validation with the dataset determined the predictive logBB model's mean squared error to be around 0.112 units and the neuroinflammation model's mean squared error to be approximately 0.3 units, outperforming all relevant studies found.


Ensemble learning using individual neonatal data for seizure detection

arXiv.org Artificial Intelligence

Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best performance, it was only marginally outperformed by the Dawid--Skene method when local detectors approach performance of a single detector trained on all available data.


Semi-self-supervised Automated ICD Coding

arXiv.org Artificial Intelligence

Clinical Text Notes (CTNs) contain physicians' reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors' diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned imputation in a semi-self-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation of a patient. The features are then used to train a classifier for the diagnosis of certain types of diseases. We report the results of an evaluation of this data augmentation method over three tiers of data availability to the physician. Our data augmentation method shows a significant positive effect which is diminished when clinical features from the examination of the patient and diagnostics are made available. We recommend our method for augmenting scarce datasets for systems that take decisions based on clinical features that do not include examinations or tests.