Support Vector Machines
Automatic Fact-Checking Using Context and Discourse Information
Atanasova, Pepa, Nakov, Preslav, Màrquez, Lluís, Barrón-Cedeño, Alberto, Karadzhov, Georgi, Mihaylova, Tsvetomila, Mohtarami, Mitra, Glass, James
We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information. We address two related tasks: (i) detecting check-worthy claims, and (ii) fact-checking claims. We develop supervised systems based on neural networks, kernel-based support vector machines, and combinations thereof, which make use of rich input representations in terms of discourse cues and contextual features. For the check-worthiness estimation task, we focus on political debates, and we model the target claim in the context of the full intervention of a participant and the previous and the following turns in the debate, taking into account contextual meta information. For the fact-checking task, we focus on answer verification in a community forum, and we model the veracity of the answer with respect to the entire question--answer thread in which it occurs as well as with respect to other related posts from the entire forum. We develop annotated datasets for both tasks and we run extensive experimental evaluation, confirming that both types of information ---but especially contextual features--- play an important role.
Inferring linear and nonlinear Interaction networks using neighborhood support vector machines
Jebreen, Kamel, Ghattas, Badih
In this paper, we consider modelling interaction between a set of variables in the context of time series and high dimension. We suggest two approaches. The first is similar to the neighborhood lasso when the lasso model is replaced by a support vector machine (SVMs). The second is a restricted Bayesian network adapted for time series. We show the efficiency of our approaches by simulations using linear, nonlinear data set and a mixture of both.
Evaluating machine learning performance in predicting injury severity in agribusiness industries
Although machine learning methods have been used as an outcome prediction tool in many fields, their utilization in predicting incident outcome in occupational safety is relatively new. This study tests the performance of machine learning techniques in modeling and predicting occupational incidents severity with respect to accessible information of injured workers in agribusiness industries using workers’ compensation claims. More than 33,000 incidents within agribusiness industries in the Midwest of the United States for 2008–2016 were analyzed. The total cost of incidents was extracted and classified from workers’ compensation claims. Supervised machine learning algorithms for classification (support vector machines with linear, quadratic, and RBF kernels, Boosted Trees, and Naïve Bayes) were applied.
Modeling Daily Pan Evaporation in Humid Climates Using Gaussian Process Regression
Shabani, Sevda, Samadianfard, Saeed, Sattari, Mohammad Taghi, Shamshirband, Shahab, Mosavi, Amir, Kmet, Tibor, Varkonyi-Koczy, Annamaria R.
Evaporation is one of the main processes in the hydrological cycle, and it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression, Nearest-Neighbor, Random Forest and Support Vector Regression were used to estimate the pan evaporation in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature, relative humidity, wind speed and sunny hours collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error, correlation coefficient and Mean Absolute Error. Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. We report that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W, and S had the most accurate performances and proposed for precise estimation of PE. Due to the high rate of evaporation in Iran and the lack of measurement instruments, the findings of the current study indicated that the PE values might be estimated with few easily measured meteorological parameters accurately.
FDive: Learning Relevance Models using Pattern-based Similarity Measures
Dennig, Frederik L., Polk, Tom, Lin, Zudi, Schreck, Tobias, Pfister, Hanspeter, Behrisch, Michael
The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.
Airbnb Price Prediction Using Machine Learning and Sentiment Analysis
Kalehbasti, Pouya Rezazadeh, Nikolenko, Liubov, Rezaei, Hoormazd
Pricing a rental property on Airbnb is a challenging task for the owner as it determines the number of customers for the place. On the other hand, customers have to evaluate an offered price with minimal knowledge of an optimal value for the property. This paper aims to develop a reliable price prediction model using machine learning, deep learning, and natural language processing techniques to aid both the property owners and the customers with price evaluation given minimal available information about the property. Features of the rentals, owner characteristics, and the customer reviews will comprise the predictors, and a range of methods from linear regression to tree-based models, support-vector regression (SVR), K-means Clustering (KMC), and neural networks (NNs) will be used for creating the prediction model.
Task Classification Model for Visual Fixation, Exploration, and Search
Kumar, Ayush, Tyagi, Anjul, Burch, Michael, Weiskopf, Daniel, Mueller, Klaus
Yarbus' claim to decode the observer's task from eye movements has received mixed reactions. In this paper, we have supported the hypothesis that it is possible to decode the task. We conducted an exploratory analysis on the dataset by projecting features and data points into a scatter plot to visualize the nuance properties for each task. Following this analysis, we eliminated highly correlated features before training an SVM and Ada Boosting classifier to predict the tasks from this filtered eye movements data. We achieve an accuracy of 95.4% on this task classification problem and hence, support the hypothesis that task classification is possible from a user's eye movement data.
A Factored Generalized Additive Model for Clinical Decision Support in the Operating Room
Cui, Zhicheng, Fritz, Bradley A, King, Christopher R, Avidan, Michael S, Chen, Yixin
Logistic regression (LR) is widely used in clinical prediction because it is simple to deploy and easy to interpret. Nevertheless, being a linear model, LR has limited expressive capability and often has unsatisfactory performance. Generalized additive models (GAMs) extend the linear model with transformations of input features, though feature interaction is not allowed for all GAM variants. In this paper, we propose a factored generalized additive model (F-GAM) to preserve the model interpretability for targeted features while allowing a rich model for interaction with features fixed within the individual. We evaluate F-GAM on prediction of two targets, postoperative acute kidney injury and acute respiratory failure, from a single-center database. We find superior model performance of F-GAM in terms of AUPRC and AUROC compared to several other GAM implementations, random forests, support vector machine, and a deep neural network. We find that the model interpretability is good with results with high face validity.
Machine Learning on Apple Podcasts
This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control.
Towards meta-learning for multi-target regression problems
Aguiar, Gabriel Jonas, Santana, Everton José, Mastelini, Saulo Martiello, Mantovani, Rafael Gomes, Barbon, Sylvio Jr
Several multi-target regression methods were devel-oped in the last years aiming at improving predictive performanceby exploring inter-target correlation within the problem. However, none of these methods outperforms the others for all problems. This motivates the development of automatic approachesto recommend the most suitable multi-target regression method. In this paper, we propose a meta-learning system to recommend the best predictive method for a given multi-target regression problem. We performed experiments with a meta-dataset generated by a total of 648 synthetic datasets. These datasets were created to explore distinct inter-targets characteristics toward recommending the most promising method. In experiments, we evaluated four different algorithms with different biases as meta-learners. Our meta-dataset is composed of 58 meta-features, based on: statistical information, correlation characteristics, linear landmarking, from the distribution and smoothness of the data, and has four different meta-labels. Results showed that induced meta-models were able to recommend the best methodfor different base level datasets with a balanced accuracy superior to 70% using a Random Forest meta-model, which statistically outperformed the meta-learning baselines.