Accuracy
Tuplemax Loss for Language Identification
Wan, Li, Sridhar, Prashant, Yu, Yang, Wang, Quan, Moreno, Ignacio Lopez
In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named tuplemax loss. As a matter of fact, a typical language identification system launched in North America has about 95% users who could speak no more than two languages. Using the tuplemax loss, our system achieved a 2.33% error rate, which is a relative 39.4% improvement over the 3.85% error rate of standard softmax loss method.
Anomaly Detection with Isolation Forests using H2O - Open Source Leader in AI and ML
Anomaly detection is a common data science problem where the goal is to identify odd or suspicious observations, events, or items in our data that might be indicative of some issues in our data collection process (such as broken sensors, typos in collected forms, etc.) or unexpected events like security breaches, server failures, and so on. Anomaly detection can be performed in a supervised, semi-supervised, and unsupervised manner. For a supervised approach, we need to know whether each observation, event or item is anomalous or genuine, and we use this information during training. Obtaining labels for each observation might often be unrealistic. A semi-supervised approach uses the assumption that we only know which observations are genuine, non-anomalous, and we do not have any information on the anomalous observations.
General-to-Detailed GAN for Infrequent Class Medical Images
Koga, Tatsuki, Nonaka, Naoki, Sakuma, Jun, Seita, Jun
Deep learning has significant potential for medical imaging. However, since the incident rate of each disease varies widely, the frequency of classes in a medical image dataset is imbalanced, leading to poor accuracy for such infrequent classes. One possible solution is data augmentation of infrequent classes using synthesized images created by Generative Adversarial Networks (GANs), but conventional GANs also require certain amount of images to learn. To overcome this limitation, here we propose General-to-detailed GAN (GDGAN), serially connected two GANs, one for general labels and the other for detailed labels. GDGAN produced diverse medical images, and the network trained with an augmented dataset outperformed other networks using existing methods with respect to Area-Under-Curve (AUC) of Receiver Operating Characteristic (ROC) curve.
Deep Ensemble Tensor Factorization for Longitudinal Patient Trajectories Classification
De Brouwer, Edward, Simm, Jaak, Arany, Adam, Moreau, Yves
We present a generative approach to classify scarcely observed longitudinal patient trajectories. The available time series are represented as tensors and factorized using generative deep recurrent neural networks. The learned factors represent the patient data in a compact way and can then be used in a downstream classification task. For more robustness and accuracy in the predictions, we used an ensemble of those deep generative models to mimic Bayesian posterior sampling. We illustrate the performance of our architecture on an intensive-care case study of in-hospital mortality prediction with 96 longitudinal measurement types measured across the first 48-hour from admission. Our combination of generative and ensemble strategies achieves an AUC of over 0.85, and outperforms the SAPS-II mortality score and GRU baselines.
Beta Distribution Drift Detection for Adaptive Classifiers
Fleckenstein, Lukas, Kauschke, Sebastian, Fรผrnkranz, Johannes
With today's abundant streams of data, the only constant we can rely on is change. For stream classification algorithms, it is necessary to adapt to concept drift. This can be achieved by monitoring the model error, and triggering counter measures as changes occur. In this paper, we propose a drift detection mechanism that fits a beta distribution to the model error, and treats abnormal behavior as drift. It works with any given model, leverages prior knowledge about this model, and allows to set application-specific confidence thresholds. Experiments confirm that it performs well, in particular when drift occurs abruptly.
Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data
The typical pipeline for a supervised machine learning project involves firstly the collection of a significant sample of labelled examples typically referred to as training data. Depending on whether the labels are continuous or categorical, the supervised learning task is known as regression or classification respectively. Next, once the training data is sufficiently clean and complete, it is used to directly build a predictive model using the machine learning algorithm of choice. The predictive model is then used to label new unlabelled examples, and if the labels of the new examples are known a priori by the user (but not used by the learning algorithm) then the predictive accuracy of the model can be evaluated. Different models can therefore be directly compared. In the usual case, the training data is "real", i.e. the model is learned directly from labelled examples that were collected specifically for that purpose. However, quite frequently, modifications are made to the training data after it is collected. For example, it is standard practice to remove outlier examples and normalise numeric values. Moreover, the machine learning algorithm itself may specify modifications to the training data.
How Can We Use Artificial Intelligence To Prevent Crime?
To many, the use of Artificial Intelligence to prevent crimes and aid in sentencing criminals may seem like a scene from a science fiction movie. However, police departments in the United Kingdom - including Durham, Kent, and South Wales - are already using facial recognition and behavioural software to prevent a crime before it occurs. Computer-driven evaluation frameworks are being used to inform custodial and sentencing decisions. The technology offers both huge promise and the prospect of dark dystopia in seemingly equal measure. This raises several ethical questions.
Accurate, Data-Efficient Learning from Noisy, Choice-Based Labels for Inherent Risk Scoring
Huang, W. Ronny, Perez, Miguel A.
Inherent risk scoring is an important function in anti-money laundering, used for determining the riskiness of an individual during onboarding $\textit{before}$ fraudulent transactions occur. It is, however, often fraught with two challenges: (1) inconsistent notions of what constitutes as high or low risk by experts and (2) the lack of labeled data. This paper explores a new paradigm of data labeling and data collection to tackle these issues. The data labeling is choice-based; the expert does not provide an absolute risk score but merely chooses the most/least risky example out of a small choice set, which reduces inconsistency because experts make only relative judgments of risk. The data collection is synthetic; examples are crafted using optimal experimental design methods, obviating the need for real data which is often difficult to obtain due to regulatory concerns. We present the methodology of an end-to-end inherent risk scoring algorithm that we built for a large financial institution. The system was trained on a small set of synthetic data (188 examples, 24 features) whose labels are obtained via the choice-based paradigm using an efficient number of expert labelers. The system achieves 89% accuracy on a test set of 52 examples, with an area under the ROC curve of 93%.
What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning
Li, Irene, Fabbri, Alexander R., Tung, Robert R., Radev, Dragomir R.
Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of "what should one learn first," we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.
InstaNAS: Instance-aware Neural Architecture Search
Cheng, An-Chieh, Lin, Chieh Hubert, Juan, Da-Cheng, Wei, Wei, Sun, Min
Neural Architecture Search (NAS) aims at finding one "single" architecture that achieves the best accuracy for a given task such as image recognition.In this paper, we study the instance-level variation,and demonstrate that instance-awareness is an important yet currently missing component of NAS. Based on this observation, we propose InstaNAS for searching toward instance-level architectures;the controller is trained to search and form a "distribution of architectures" instead of a single final architecture. Then during the inference phase, the controller selects an architecture from the distribution, tailored for each unseen image to achieve both high accuracy and short latency. The experimental results show that InstaNAS reduces the inference latency without compromising classification accuracy. On average, InstaNAS achieves 48.9% latency reduction on CIFAR-10 and 40.2% latency reduction on CIFAR-100 with respect to MobileNetV2 architecture.