Performance Analysis
Learning to Weight for Text Classification
Fernández, Alejandro Moreo, Esuli, Andrea, Sebastiani, Fabrizio
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach \emph{Learning to Weight} (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.
A Survey on Graph Kernels
Kriege, Nils M., Johansson, Fredrik D., Morris, Christopher
Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification.
Real-time data-driven detection of the rock type alteration during a directional drilling
Romanenkova, Evgenya, Zaytsev, Alexey, Klyuchnikov, Nikita, Gruzdev, Arseniy, Antipova, Ksenia, Ismailova, Leyla, Burnaev, Evgeny, Semenikhin, Artyom, Koryabkin, Vitaliy, Simon, Igor, Koroteev, Dmitry
During the directional drilling, a bit may sometimes go to a nonproductive rock layer due to the gap about 20 m between the bit and high-fidelity rock type sensors. The only way to detect the lithotype changes in time is the usage of Measurements While Drilling (MWD) data. However, there are no mathematical modeling approaches that reconstruct the rock type based on MWD data with high accuracy. In this article, we present a data-driven procedure that utilizes MWD data for quick detection of changes in rock type. We propose the approach that combines traditional machine learning based on the solution of the rock type classification problem with change detection procedures rarely used before in Oil & Gas industry. The data come from a newly developed oilfield in the North of Western Siberia. The results suggest that we can detect a significant part of changes in rock type reducing the change detection delay from 20 to 2.6 m and the number of false positive alarms from 71 to 7 per well.
Outlier-Robust Spatial Perception: Hardness, General-Purpose Algorithms, and Guarantees
Tzoumas, Vasileios, Antonante, Pasquale, Carlone, Luca
Spatial perception is the backbone of many robotics applications, and spans a broad range of research problems, including localization and mapping, point cloud alignment, and relative pose estimation from camera images. Robust spatial perception is jeopardized by the presence of incorrect data association, and in general, outliers. Although techniques to handle outliers do exist, they can fail in unpredictable manners (e.g., RANSAC, robust estimators), or can have exponential runtime (e.g., branch-and-bound). In this paper, we advance the state of the art in outlier rejection by making three contributions. First, we show that even a simple linear instance of outlier rejection is inapproximable: in the worst-case one cannot design a quasi-polynomial time algorithm that computes an approximate solution efficiently. Our second contribution is to provide the first per-instance sub-optimality bounds to assess the approximation quality of a given outlier rejection outcome. Our third contribution is to propose a simple general-purpose algorithm, named adaptive trimming, to remove outliers. Our algorithm leverages recently-proposed global solvers that are able to solve outlier-free problems, and iteratively removes measurements with large errors. We demonstrate the proposed algorithm on three spatial perception problems: 3D registration, two-view geometry, and SLAM. The results show that our algorithm outperforms several state-of-the-art methods across applications while being a general-purpose method.
Generative Tensor Network Classification Model for Supervised Machine Learning
Sun, Zheng-Zhi, Peng, Cheng, Liu, Ding, Ran, Shi-Ju, Su, Gang
Tensor network (TN) has recently triggered extensive interests in developing machine-learning models in quantum many-body Hilbert space. Here we purpose a generative TN classification (GTNC) approach for supervised learning. The strategy is to train the generative TN for each class of the samples to construct the classifiers. The classification is implemented by comparing the distance in the many-body Hilbert space. The numerical experiments by GTNC show impressive performance on the MNIST and Fashion-MNIST dataset. The testing accuracy is competitive to the state-of-the-art convolutional neural network while higher than the naive Bayes classifier (a generative classifier) and support vector machine. Moreover, GTNC is more efficient than the existing TN models that are in general discriminative. By investigating the distances in the many-body Hilbert space, we find that (a) the samples are naturally clustering in such a space; and (b) bounding the bond dimensions of the TN's to finite values corresponds to removing redundant information in the image recognition. These two characters make GTNC an adaptive and universal model of excellent performance.
Efficient Incremental Learning for Mobile Object Detection
Li, Dawei, Tasci, Serafettin, Ghosh, Shalini, Zhu, Jingwen, Zhang, Junting, Heck, Larry
Object detection models shipped with camera-equipped mobile devices cannot cover the objects of interest for every user. Therefore, the incremental learning capability is a critical feature for a robust and personalized mobile object detection system that many applications would rely on. In this paper, we present an efficient yet practical system, IMOD, to incrementally train an existing object detection model such that it can detect new object classes without losing its capability to detect old classes. The key component of IMOD is a novel incremental learning algorithm that trains end-to-end for one-stage object detection deep models only using training data of new object classes. Specifically, to avoid catastrophic forgetting, the algorithm distills three types of knowledge from the old model to mimic the old model's behavior on object classification, bounding box regression and feature extraction. In addition, since the training data for the new classes may not be available, a real-time dataset construction pipeline is designed to collect training images on-the-fly and automatically label the images with both category and bounding box annotations. We have implemented IMOD under both mobile-cloud and mobile-only setups. Experiment results show that the proposed system can learn to detect a new object class in just a few minutes, including both dataset construction and model training. In comparison, traditional fine-tuning based method may take a few hours for training, and in most cases would also need a tedious and costly manual dataset labeling step.
Machine learning approaches in Detecting the Depression from Resting-state Electroencephalogram (EEG): A Review Study
In this paper, we aimed at reviewing several different approaches present today in the search for more accurate diagnostic and treatment management in mental healthcare. Our focus is on mood disorders, and in particular on the major depressive disorder (MDD). We are reviewing and discussing findings based on neuroimaging studies (MRI and fMRI) first to get the impression of the body of knowledge about the anatomical and functional differences in depression. Then, we are focusing on less expensive data-driven approach, applicable for everyday clinical practice, in particular, those based on electroencephalographic (EEG) recordings. Among those studies utilizing EEG, we are discussing a group of applications used for detecting of depression based on the resting state EEG (detection studies) and interventional studies (using stimulus in their protocols or aiming to predict the outcome of therapy). We conclude with a discussion and review of guidelines to improve the reliability of developed models that could serve improvement of diagnostic of depression in psychiatry.
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Dunnmon, Jared, Ratner, Alexander, Khandwala, Nishith, Saab, Khaled, Markert, Matthew, Sagreiya, Hersh, Goldman, Roger, Lee-Messer, Christopher, Lungren, Matthew, Rubin, Daniel, Ré, Christopher
Department of Biomedical Data Science, Stanford University, Stanford, California, USA Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine. In addition to being extremely costly, these training sets are inflexible: given a new classification schema, imaging system, patient population, or other change in the data distribution or modeling task, the training set generally needs to be relabeled from scratch. One manifestation of this shift in the broader machine learning community is the increasing use of weak supervision approaches, where training data is labeled in noisier, higher-level, often programmatic ways, rather than manually by experts. We broadly characterize these methods as cross-modal weak supervision approaches, in which the strategy is to programmatically extract labels from an auxiliary modality--e.g. the unstructured text reports accompanying an imaging study--which are then used as training labels for a model defined over the target modality, e.g. These methods follow the intuition that programmatically extracting labels from the auxiliary modality can be far faster and easier than hand-labeling or deriving labels from the target modality directly.
Sparse Learning for Variable Selection with Structures and Nonlinearities
In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models.
What does AI see? Deep segmentation networks discover biomarkers for lung cancer survival
Baek, Stephen, He, Yusen, Allen, Bryan G., Buatti, John M., Smith, Brian J., Plichta, Kristin A., Seyedin, Steven N., Gannon, Maggie, Cabel, Katherine R., Kim, Yusung, Wu, Xiaodong
Non-small-cell lung cancer (NSCLC) represents approximately 80-85% of lung cancer diagnoses and is the leading cause of cancer-related death worldwide. Recent studies indicate that image-based radiomics features from positron emission tomography-computed tomography (PET/CT) images have predictive power on NSCLC outcomes. To this end, easily calculated functional features such as the maximum and the mean of standard uptake value (SUV) and total lesion glycolysis (TLG) are most commonly used for NSCLC prognostication, but their prognostic value remains controversial. Meanwhile, convolutional neural networks (CNN) are rapidly emerging as a new premise for cancer image analysis, with significantly enhanced predictive power compared to other hand-crafted radiomics features. Here we show that CNN trained to perform the tumor segmentation task, with no other information than physician contours, identify a rich set of survival-related image features with remarkable prognostic value. In a retrospective study on 96 NSCLC patients before stereotactic-body radiotherapy (SBRT), we found that the CNN segmentation algorithm (U-Net) trained for tumor segmentation in PET/CT images, contained features having strong correlation with 2- and 5-year overall and disease-specific survivals. The U-net algorithm has not seen any other clinical information (e.g. survival, age, smoking history) than the images and the corresponding tumor contours provided by physicians. Furthermore, through visualization of the U-Net, we also found convincing evidence that the regions of progression appear to match with the regions where the U-Net features identified patterns that predicted higher likelihood of death. We anticipate our findings will be a starting point for more sophisticated non-intrusive patient specific cancer prognosis determination.