Goto

Collaborating Authors

 Support Vector Machines


Dual SVM Training on a Budget

arXiv.org Machine Learning

Support Vector Machines (SVMs) introduced by [5] are popular machine learning methods, in particular for binary classification. They are supported by learning-theoretical guarantees [14], and they exhibit excellent generalization performance in many applications in science and technology [1, 16, 29, 23, 22, 3, 18, 19, 10]. They belong to the family of kernel methods, applying a linear algorithm in a feature space defined implicitly by a kernel function. Training an SVM corresponds to solving a large-scale optimization problem, which can be cast into a quadratic program (QP). The primal problem can be solved directly with stochastic gradient descent (SGD) and accelerated variants [21, 8], while the dual QP is solved with subspace ascent, see [2] and references therein.


Speeding Up Budgeted Stochastic Gradient Descent SVM Training with Precomputed Golden Section Search

arXiv.org Machine Learning

Limiting the model size of a kernel support vector machine to a pre-defined budget is a well-established technique that allows to scale SVM learning and prediction to large-scale data. Its core addition to simple stochastic gradient training is budget maintenance through merging of support vectors. This requires solving an inner optimization problem with an iterative method many times per gradient step. In this paper we replace the iterative procedure with a fast lookup. We manage to reduce the merging time by up to 65% and the total training time by 44% without any loss of accuracy.


A Machine-learning framework for automatic reference-free quality assessment in MRI

arXiv.org Machine Learning

Magnetic resonance (MR) imaging offers a wide variety of imaging techniques. A large amount of data is created per examination which needs to be checked for sufficient quality in order to derive a meaningful diagnosis. This is a manual process and therefore time- and cost-intensive. Any imaging artifacts originating from scanner hardware, signal processing or induced by the patient may reduce the image quality and complicate the diagnosis or any image post-processing. Therefore, the assessment or the ensurance of sufficient image quality in an automated manner is of high interest. Usually no reference image is available or difficult to define. Therefore, classical reference-based approaches are not applicable. Model observers mimicking the human observers (HO) can assist in this task. Thus, we propose a new machine-learning-based reference-free MR image quality assessment framework which is trained on HO-derived labels to assess MR image quality immediately after each acquisition. We include the concept of active learning and present an efficient blinded reading platform to reduce the effort in the HO labeling procedure. Derived image features and the applied classifiers (support-vector-machine, deep neural network) are investigated for a cohort of 250 patients. The MR image quality assessment framework can achieve a high test accuracy of 93.7$\%$ for estimating quality classes on a 5-point Likert-scale. The proposed MR image quality assessment framework is able to provide an accurate and efficient quality estimation which can be used as a prospective quality assurance including automatic acquisition adaptation or guided MR scanner operation, and/or as a retrospective quality assessment including support of diagnostic decisions or quality control in cohort studies.


Predictive Maintenance for Industrial IoT of Vehicle Fleets using Hierarchical Modified Fuzzy Support Vector Machine

arXiv.org Artificial Intelligence

Connected vehicle fleets are deployed worldwide in several industrial IoT scenarios. With the gradual increase of machines being controlled and managed through networked smart devices, the predictive maintenance potential grows rapidly. Predictive maintenance has the potential of optimizing uptime as well as performance such that time and labor associated with inspections and preventive maintenance are reduced. In order to understand the trends of vehicle faults with respect to important vehicle attributes viz mileage, age, vehicle type etc this problem is addressed through hierarchical modified fuzzy support vector machine (HMFSVM). The proposed method is compared with other commonly used approaches like logistic regression, random forests and support vector machines. This helps better implementation of telematics data to ensure preventative management as part of the desired solution. The superiority of the proposed method is highlighted through several experimental results.


Top Machine Learning Research Groups To Follow In India

#artificialintelligence

Indian Institute of Science's Machine Learning Special Interest Group: Touted as one of the best research groups in India, especially the one with a beautiful campus, IISc's MLSIG features several talented students and faculty members engaged in cutting-edge research on a variety of aspects of ML and related fields. These works range from theoretical foundations to new algorithms as well as other exciting applications. IISc MLSIG has a great roster of events that covers topics such as deep learning with GPUs, data mining (models, algorithms and applications), text analysis, knowledge representation and reasoning with DNN. MLSIG is also doing cutting-edge research that is published in top journals and conferences. Some of the research topics are ML in text mining, ML in computer vision, graphical models, clustering, support vector machines and kernel-based learning methods.


Forecasting Internally Displaced Population Migration Patterns in Syria and Yemen

arXiv.org Machine Learning

Armed conflict has led to an unprecedented number of internally displaced persons (IDPs) - individuals who are forced out of their homes but remain within their country. IDPs often urgently require shelter, food, and healthcare, yet prediction of when large fluxes of IDPs will cross into an area remains a major challenge for aid delivery organizations. Accurate forecasting of IDP migration would empower humanitarian aid groups to more effectively allocate resources during conflicts. We show that monthly flow of IDPs from province to province in both Syria and Yemen can be accurately forecasted one month in advance, using publicly available data. We model monthly IDP flow using data on food price, fuel price, wage, geospatial, and news data. We find that machine learning approaches can more accurately forecast migration trends than baseline persistence models. Our findings thus potentially enable proactive aid allocation for IDPs in anticipation of forecasted arrivals.


Improving Tourism Prediction Models Using Climate and Social Media Data: A Fine-Grained Approach

AAAI Conferences

Accurate predictions about future events is essential in many areas, one of them being the Tourism Industry. Usually, countries and cities invest a huge amount of money in planning and preparation in order to welcome (and profit from) tourists. An accurate prediction of the number of visits in the following days or months could help both the economy and tourists. Prior studies in this domain explore forecasting for a whole country rather than for fine-grained areas within a country (e.g., specific touristic attractions). In this work, we suggest that accessible data from online social networks and travel websites, in addition to climate data, can be used to support the inference of visitation count for many touristic attractions. To test our hypothesis we analyze visitation, climate and social media data in more than 70 National Parks in U.S during the last 3 years. The experimental results reveal a high correlation between social media data and tourism demands; in fact, in over 80\% of the parks, social media reviews and visitation counts are correlated by more than 50\%. Moreover, we assess the effectiveness of employing various prediction techniques, finding that even a simple linear regression model, when fed with social media and climate data as input features, can attain a prediction accuracy of over 80\% while a more robust algorithm, such as Support Vector Regression, reaches up to 94\% accuracy.


Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study

arXiv.org Machine Learning

Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, to date, there has been little work exploring exactly which topological structures are being learned in the embeddings process. In this paper, we investigate if graph embeddings are approximating something analogous with traditional vertex level graph features. If such a relationship can be found, it could be used to provide a theoretical insight into how graph embedding approaches function. We perform this investigation by predicting known topological features, using supervised and unsupervised methods, directly from the embedding space. If a mapping between the embeddings and topological features can be found, then we argue that the structural information encapsulated by the features is represented in the embedding space. To explore this, we present extensive experimental evaluation from five state-of-the-art unsupervised graph embedding techniques, across a range of empirical graph datasets, measuring a selection of topological features. We demonstrate that several topological features are indeed being approximated by the embedding space, allowing key insight into how graph embeddings create good representations.


A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

arXiv.org Artificial Intelligence

Identifying crime for forensic investigating teams when crimes involve people of different nationals is challenging. This paper proposes a new method for ethnicity (nationality) identification based on Cloud of Line Distribution (COLD) features of handwriting components. The proposed method, at first, explores tangent angle for the contour pixels in each row and the mean of intensity values of each row in an image for segmenting text lines. For segmented text lines, we use tangent angle and direction of base lines to remove rule lines in the image. We use polygonal approximation for finding dominant points for contours of edge components. Then the proposed method connects the nearest dominant points of every dominant point, which results in line segments of dominant point pairs. For each line segment, the proposed method estimates angle and length, which gives a point in polar domain. For all the line segments, the proposed method generates dense points in polar domain, which results in COLD distribution. As character component shapes change, according to nationals, the shape of the distribution changes. This observation is extracted based on distance from pixels of distribution to Principal Axis of the distribution. Then the features are subjected to an SVM classifier for identifying nationals. Experiments are conducted on a complex dataset, which show the proposed method is effective and outperforms the existing method


PAC-Bayes bounds for stable algorithms with instance-dependent priors

arXiv.org Machine Learning

Csaba Szepesvari Deepmind PAC-Bayes bounds have been proposed to get risk estimates based on a training sample. In this paper the PAC-Bayes approach is combined with stability of the hypothesis learned by a Hilbert space valued algorithm. The PAC-Bayes setting is used with a Gaussian prior centered at the expected output. Thus a novelty of our paper is using priors defined in terms of the data-generating distribution. Our main result estimates the risk of the randomized algorithm in terms of the hypothesis stability coefficients. We also provide a new bound for the SVM classifier, which is compared to other known bounds experimentally. Ours appears to be the first stability-based bound that evaluates to nontrivial values.