Accuracy
Development of a Deep Learning System for Intra-Operative Identification of Cancer Metastases
Schnelldorfer, Thomas, Castro, Janil, Goldar-Najafi, Atoussa, Liu, Liping
For several cancer patients, operative resection with curative intent can end up in early recurrence of the cancer. Current limitations in peri-operative cancer staging and especially intra-operative misidentification of visible metastases is likely the main reason leading to unnecessary operative interventions in the affected individuals. Here, we evaluate whether an artificial intelligence (AI) system can improve recognition of peritoneal surface metastases on routine staging laparoscopy images from patients with gastrointestinal malignancies. In a simulated setting evaluating biopsied peritoneal lesions, a prototype deep learning surgical guidance system outperformed oncologic surgeons in identifying peritoneal surface metastases. In this environment the developed AI model would have improved the identification of metastases by 5% while reducing the number of unnecessary biopsies by 28% compared to current standard practice. Evaluating non-biopsied peritoneal lesions, the findings support the possibility that the AI system could identify peritoneal surface metastases that were falsely deemed benign in clinical practice. Our findings demonstrate the technical feasibility of an AI system for intra-operative identification of peritoneal surface metastases, but require future assessment in a multi-institutional clinical setting.
FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue
Zeng, Weihao, He, Keqing, Wang, Yejie, Zeng, Chen, Wang, Jingang, Xian, Yunsen, Xu, Weiran
Pre-trained language models based on general text enable huge success in the NLP scenario. But the intrinsical difference of linguistic patterns between general text and task-oriented dialogues makes existing pre-trained language models less useful in practice. Current dialogue pre-training methods rely on a contrastive framework and face the challenges of both selecting true positives and hard negatives. In this paper, we propose a novel dialogue pre-training model, FutureTOD, which distills future knowledge to the representation of the previous dialogue context using a self-training framework. Our intuition is that a good dialogue representation both learns local context information and predicts future information. Extensive experiments on diverse downstream dialogue tasks demonstrate the effectiveness of our model, especially the generalization, robustness, and learning discriminative dialogue representations capabilities.
Deep Intellectual Property Protection: A Survey
Sun, Yuchen, Liu, Tianpeng, Hu, Panhe, Liao, Qing, Fu, Shaojing, Yu, Nenghai, Guo, Deke, Liu, Yongxiang, Liu, Li
Deep Neural Networks (DNNs), from AlexNet to ResNet to ChatGPT, have made revolutionary progress in recent years, and are widely used in various fields. The high performance of DNNs requires a huge amount of high-quality data, expensive computing hardware, and excellent DNN architectures that are costly to obtain. Therefore, trained DNNs are becoming valuable assets and must be considered the Intellectual Property (IP) of the legitimate owner who created them, in order to protect trained DNN models from illegal reproduction, stealing, redistribution, or abuse. Although being a new emerging and interdisciplinary field, numerous DNN model IP protection methods have been proposed. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of two mainstream DNN IP protection methods: deep watermarking and deep fingerprinting, with a proposed taxonomy. More than 190 research contributions are included in this survey, covering many aspects of Deep IP Protection: problem definition, main threats and challenges, merits and demerits of deep watermarking and deep fingerprinting methods, evaluation metrics, and performance discussion. We finish the survey by identifying promising directions for future research.
What is the state of the art? Accounting for multiplicity in machine learning benchmark performance
Mรธllersen, Kajsa, Holsbรธ, Einar
Machine learning methods are commonly evaluated and compared by their performance on data sets from public repositories. This allows for multiple methods, oftentimes several thousands, to be evaluated under identical conditions and across time. The highest ranked performance on a problem is referred to as state-of-the-art (SOTA) performance, and is used, among other things, as a reference point for publication of new methods. Using the highest-ranked performance as an estimate for SOTA is a biased estimator, giving overly optimistic results. The mechanisms at play are those of multiplicity, a topic that is well-studied in the context of multiple comparisons and multiple testing, but has, as far as the authors are aware of, been nearly absent from the discussion regarding SOTA estimates. The optimistic state-of-the-art estimate is used as a standard for evaluating new methods, and methods with substantial inferior results are easily overlooked. In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided. We demonstrate the impact of multiplicity through a simulated example with independent classifiers. We show how classifier dependency impacts the variance, but also that the impact is limited when the accuracy is high. Finally, we discuss a real-world example; a Kaggle competition from 2020.
Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing
Brown, Alexander, Tomasev, Nenad, Freyberg, Jan, Liu, Yuan, Karthikesalingam, Alan, Schrouff, Jessica
Machine learning (ML) holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. An important step is to characterize the (un)fairness of ML models - their tendency to perform differently across subgroups of the population - and to understand its underlying mechanisms. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. However, diagnosing this phenomenon is difficult, especially when sensitive attributes are causally linked with disease. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems, and demonstrate its application to clinical tasks in radiology and dermatology. Finally, our approach reveals instances when shortcutting is not responsible for unfairness, highlighting the need for a holistic approach to fairness mitigation in medical AI.
Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data
Tan, Qingyu, Xu, Lu, Bing, Lidong, Ng, Hwee Tou
Relation extraction (RE) aims to extract relations from sentences and documents. Existing relation extraction models typically rely on supervised machine learning. However, recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'. Models trained with such data inevitably make similar mistakes during the inference stage. Self-training has been proven effective in alleviating the false negative problem. However, traditional self-training is vulnerable to confirmation bias and exhibits poor performance in minority classes. To overcome this limitation, we proposed a novel class-adaptive re-sampling self-training framework. Specifically, we re-sampled the pseudo-labels for each class by precision and recall scores. Our re-sampling strategy favored the pseudo-labels of classes with high precision and low recall, which improved the overall recall without significantly compromising precision. We conducted experiments on document-level and biomedical relation extraction datasets, and the results showed that our proposed self-training framework consistently outperforms existing competitive methods on the Re-DocRED and ChemDisgene datasets when the training data are incompletely annotated. Our code is released at https://github.com/DAMO-NLP-SG/CAST.
Ensemble Framework for Cardiovascular Disease Prediction
Tiwari, Achyut, Chugh, Aryan, Sharma, Aman
Heart disease is the major cause of non-communicable and silent death worldwide. Heart diseases or cardiovascular diseases are classified into four types: coronary heart disease, heart failure, congenital heart disease, and cardiomyopathy. It is vital to diagnose heart disease early and accurately in order to avoid further injury and save patients' lives. As a result, we need a system that can predict cardiovascular disease before it becomes a critical situation. Machine learning has piqued the interest of researchers in the field of medical sciences. For heart disease prediction, researchers implement a variety of machine learning methods and approaches. In this work, to the best of our knowledge, we have used the dataset from IEEE Data Port which is one of the online available largest datasets for cardiovascular diseases individuals. The dataset isa combination of Hungarian, Cleveland, Long Beach VA, Switzerland & Statlog datasets with important features such as Maximum Heart Rate Achieved, Serum Cholesterol, Chest Pain Type, Fasting blood sugar, and so on. To assess the efficacy and strength of the developed model, several performance measures are used, such as ROC, AUC curve, specificity, F1-score, sensitivity, MCC, and accuracy. In this study, we have proposed a framework with a stacked ensemble classifier using several machine learning algorithms including ExtraTrees Classifier, Random Forest, XGBoost, and so on. Our proposed framework attained an accuracy of 92.34% which is higher than the existing literature.
Conformal Language Modeling
Quach, Victor, Fisch, Adam, Schuster, Tal, Yala, Adam, Sohn, Jae Ho, Jaakkola, Tommi S., Barzilay, Regina
In this paper, we propose a novel approach to conformal prediction for generative language models (LMs). Standard conformal prediction produces prediction sets--in place of single predictions--that have rigorous, statistical performance guarantees. LM responses are typically sampled from the model's predicted distribution over the large, combinatorial output space of natural language. Translating this process to conformal prediction, we calibrate a stopping rule for sampling different outputs from the LM that get added to a growing set of candidates until we are confident that the output set is sufficient. Since some samples may be lowquality, we also simultaneously calibrate and apply a rejection rule for removing candidates from the output set to reduce noise. Similar to conformal prediction, we prove that the sampled set returned by our procedure contains at least one acceptable answer with high probability, while still being empirically precise (i.e., small) on average. Furthermore, within this set of candidate responses, we show that we can also accurately identify subsets of individual components--such as phrases or sentences--that are each independently correct (e.g., that are not "hallucinations"), again with statistical guarantees. We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation using different LM variants.
A Metaheuristic-based Machine Learning Approach for Energy Prediction in Mobile App Development
Mousavirad, Seyed Jalaleddin, Alexandre, Luรญs A.
Energy consumption plays a vital role in mobile App development for developers and end-users, and it is considered one of the most crucial factors for purchasing a smartphone. In addition, in terms of sustainability, it is essential to find methods to reduce the energy consumption of mobile devices since the extensive use of billions of smartphones worldwide significantly impacts the environment. Despite the existence of several energy-efficient programming practices in Android, the leading mobile ecosystem, machine learning-based energy prediction algorithms for mobile App development have yet to be reported. Therefore, this paper proposes a histogram-based gradient boosting classification machine (HGBC), boosted by a metaheuristic approach, for energy prediction in mobile App development. Our metaheuristic approach is responsible for two issues. First, it finds redundant and irrelevant features without any noticeable change in performance. Second, it performs a hyper-parameter tuning for the HGBC algorithm. Since our proposed metaheuristic approach is algorithm-independent, we selected 12 algorithms for the search strategy to find the optimal search algorithm. Our finding shows that a success-history-based parameter adaption for differential evolution with linear population size (L-SHADE) offers the best performance. It can improve performance and decrease the number of features effectively. Our extensive set of experiments clearly shows that our proposed approach can provide significant results for energy consumption prediction.
Using Machine Learning Methods for Automation of Size Grid Building and Management
Yunus, Salim, Benoit, Dries, Peleja, Filipa
Fashion apparel companies require planning for the next season, a year in advance for supply chain management. This study focuses on size selection decision making for Levi Strauss. Currently, the region and planning group level size grids are built and managed manually. The company suffers from the workload it creates for sizing, merchant and planning teams. This research is aiming to answer two research questions: "Which sizes should be available to the planners under each size grid name for the next season(s)?" and "Which sizes should be adopted for each planning group for the next season(s)?". We approach to the problem with a classification model, which is one of the popular models used in machine learning. With this research, a more automated process was created by using machine learning techniques. A decrease in workload of the teams in the company is expected after it is put into practice. Unlike many studies in the state of art for fashion and apparel industry, this study focuses on sizes where the stock keeping unit represents a product with a certain size.