Support Vector Machines
Color-based template selection for detection of gastric abnormalities in video endoscopy
Color-based segmentation method has been applied for abnormal area selection. Color and texture features are combined for discrimination of abnormal regions and frames. We compare the segmentation results with the current state-of-the-art methods. Computer-aided diagnosis of gastric diseases from endoscopy frames is an important task. Colors are the basic visual features of endoscopic images and also provide clues about abnormal regions in endoscopy frames.
Histogram Transform Ensembles for Large-scale Regression
Hang, Hanyuan, Lin, Zhouchen, Liu, Xiaoyu, Wen, Hongwei
We propose a novel algorithm for large-scale regression problems named histogram transform ensembles (HTE), composed of random rotations, stretchings, and translations. First of all, we investigate the theoretical properties of HTE when the regression function lies in the H\"{o}lder space $C^{k,\alpha}$, $k \in \mathbb{N}_0$, $\alpha \in (0,1]$. In the case that $k=0, 1$, we adopt the constant regressors and develop the na\"{i}ve histogram transforms (NHT). Within the space $C^{0,\alpha}$, although almost optimal convergence rates can be derived for both single and ensemble NHT, we fail to show the benefits of ensembles over single estimators theoretically. In contrast, in the subspace $C^{1,\alpha}$, we prove that if $d \geq 2(1+\alpha)/\alpha$, the lower bound of the convergence rates for single NHT turns out to be worse than the upper bound of the convergence rates for ensemble NHT. In the other case when $k \geq 2$, the NHT may no longer be appropriate in predicting smoother regression functions. Instead, we apply kernel histogram transforms (KHT) equipped with smoother regressors such as support vector machines (SVMs), and it turns out that both single and ensemble KHT enjoy almost optimal convergence rates. Then we validate the above theoretical results by numerical experiments. On the one hand, simulations are conducted to elucidate that ensemble NHT outperform single NHT. On the other hand, the effects of bin sizes on accuracy of both NHT and KHT also accord with theoretical analysis. Last but not least, in the real-data experiments, comparisons between the ensemble KHT, equipped with adaptive histogram transforms, and other state-of-the-art large-scale regression estimators verify the effectiveness and accuracy of our algorithm.
Feature Engineering Combined with 1 D Convolutional Neural Network for Improved Mortality Prediction
Maheshwari, Saumil, Verma, Rohit, Shukla, Anupam, Tiwari, Ritu, Garg, Rishu
The intensive care units (ICUs) are responsible for generating a wealth of useful data in the form of Electronic Health Record (EHR). This data allows for the development of a prediction tool with perfect knowledge backing. We aimed to build a mortality prediction model on 2012 Physionet Challenge mortality prediction database of 4000 patients admitted in ICU. The challenges in the dataset, such as high dimensionality, imbalanced distribution, and missing values were tackled with analytical methods and tools via feature engineering and new variable construction. The objective of the research is to utilize the relations among the clinical variables and construct new variables which would establish the effectiveness of 1-Dimensional Convolutional Neural Network (1- D CNN) with constructed features. Its performance with the traditional machine learning algorithms like XGBoost classifier, Support Vector Machine (SVM), K-Neighbours Classifier (K-NN), and Random Forest Classifier (RF) is compared for Area Under Curve (AUC). The investigation reveals the best AUC of 0.848 using 1-D CNN model.
MRI correlates of chronic symptoms in mild traumatic brain injury
Kerley, Cailey I., Schilling, Kurt G., Blaber, Justin, Miller, Beth, Newton, Allen, Anderson, Adam W., Landman, Bennett A., Rex, Tonia S.
Some veterans with a history of mild traumatic brain injury (mTBI) have reported experiencing auditory and visual dysfunction that persist beyond the acute phase of the incident. The etiology behind th ese symptoms is difficult to characterize, since mTBI is defined by negative imaging findings on current clinical imaging. There are several competing hypotheses that could explain functional deficits; one example is shear inju ry, which may manifest in dif fusion - weighted magnetic resonance (MR) imaging (DWI) . Herein, we explore this alternative hypothe sis in a pilot study of multi - parametric MR imaging. Briefly, we consider a cohort of 8 mTBI patients relative to 22 control subjects using structural T1 - weig hted imaging (T1w) and connectivity with DWI.
Robust Deep Graph Based Learning for Binary Classification
Ye, Minxiang, Stankovic, Vladimir, Stankovic, Lina, Cheung, Gene
Convolutional neural network (CNN)-based feature learning has become state of the art, since given sufficient training data, CNN can significantly outperform traditional methods for various classification tasks. However, feature learning becomes more difficult if some training labels are noisy. With traditional regularization techniques, CNN often overfits to the noisy training labels, resulting in sub-par classification performance. In this paper, we propose a robust binary classifier, based on CNNs, to learn deep metric functions, which are then used to construct an optimal underlying graph structure used to clean noisy labels via graph Laplacian regularization (GLR). GLR is posed as a convex maximum a posteriori (MAP) problem solved via convex quadratic programming (QP). To penalize samples around the decision boundary, we propose two regularized loss functions for semi-supervised learning. The binary classification experiments on three datasets, varying in number and type of features, demonstrate that given a noisy training dataset, our proposed networks outperform several state-of-the-art classifiers, including label-noise robust support vector machine, CNNs with three different robust loss functions, model-based GLR, and dynamic graph CNN classifiers.
Influenza Modeling Based on Massive Feature Engineering and International Flow Deconvolution
Liu, Ziming, Wang, Yixuan, Han, Zizhao, Wu, Dian
In this article, we focus on the analysis of the potential factors driving the spread of influenza, and possible policies to mitigate the adverse effects of the disease. To be precise, we first invoke discrete Fourier transform (DFT) to conclude a yearly periodic regional structure in the influenza activity, thus safely restricting ourselves to the analysis of the yearly influenza behavior. Then we collect a massive number of possible region-wise indicators contributing to the influenza mortality, such as consumption, immunization, sanitation, water quality, and other indicators from external data, with $1170$ dimensions in total. We extract significant features from the high dimensional indicators using a combination of data analysis techniques, including matrix completion, support vector machines (SVM), autoencoders, and principal component analysis (PCA). Furthermore, we model the international flow of migration and trade as a convolution on regional influenza activity, and solve the deconvolution problem as higher-order perturbations to the linear regression, thus separating regional and international factors related to the influenza mortality. Finally, both the original model and the perturbed model are tested on regional examples, as validations of our models. Pertaining to the policy, we make a proposal based on the connectivity data along with the previously extracted significant features to alleviate the impact of influenza, as well as efficiently propagate and carry out the policies. We conclude that environmental features and economic features are of significance to the influenza mortality. The model can be easily adapted to model other types of infectious diseases.
Exploring the Characterization and Classification of EEG Signals for a Computer-Aided Epilepsy Diagnosis System
Epilepsy occurs when localized electrical activity of neurons suffer from an imbalance. One of the most adequate methods for diagnosing and monitoring is via the analysis of electroencephalographic (EEG) signals. Despite there is a wide range of alternatives to characterize and classify EEG signals for epilepsy analysis purposes, many key aspects related to accuracy and physiological interpretation are still considered as open issues. In this paper, this work performs an exploratory study in order to identify the most adequate frequently-used methods for characterizing and classifying epileptic seizures. In this regard, a comparative study is carried out on several subsets of features using four representative classifiers: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM).
Modeling and Prediction of Iran's Steel Consumption Based on Economic Activity Using Support Vector Machines
Kamalzadeh, Hossein, Sobhan, Saeid Nassim, Boskabadi, Azam, Hatami, Mohsen, Gharehyakheh, Amin
The steel industry has great impacts on the economy and the environment of both developed and underdeveloped countries. The importance of this industry and these impacts have led many researchers to investigate the relationship between a country's steel consumption and its economic activity resulting in the so-called intensity of use model. This paper investigates the validity of the intensity of use model for the case of Iran's steel consumption and extends this hypothesis by using the indexes of economic activity to model the steel consumption. We use the proposed model to train support vector machines and predict the future values for Iran's steel consumption. The paper provides detailed correlation tests for the factors used in the model to check for their relationships with the steel consumption. The results indicate that Iran's steel consumption is strongly correlated with its economic activity following the same pattern as the economy has been in the last four decades.
Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation
Balagopalan, Aparna, Novikova, Jekaterina, McDermott, Matthew B. A., Nestor, Bret, Naumann, Tristan, Ghassemi, Marzyeh
Multi-language speech datasets are scarce and often have small sample sizes in the medical domain. Robust transfer of linguistic features across languages could improve rates of early diagnosis and therapy for speakers of low-resource languages when detecting health conditions from speech. We utilize out-of-domain, unpaired, single-speaker, healthy speech data for training multiple Optimal Transport (OT) domain adaptation systems. We learn mappings from other languages to English and detect aphasia from linguistic characteristics of speech, and show that OT domain adaptation improves aphasia detection over unilingual baselines for French (6% increased F1) and Mandarin (5% increased F1). Further, we show that adding aphasic data to the domain adaptation system significantly increases performance for both French and Mandarin, increasing the F1 scores further (10% and 8% increase in F1 scores for French and Mandarin, respectively, over unilingual baselines).