Goto

Collaborating Authors

 Regression


Khatibi

AAAI Conferences

Accurate predictions about future events is essential in many areas, one of them being the Tourism Industry. Usually, countries and cities invest a huge amount of money in planning and preparation in order to welcome (and profit from) tourists. An accurate prediction of the number of visits in the following days or months could help both the economy and tourists. Prior studies in this domain explore forecasting for a whole country rather than for fine-grained areas within a country (e.g., specific touristic attractions). In this work, we suggest that accessible data from online social networks and travel websites, in addition to climate data, can be used to support the inference of visitation count for many touristic attractions. To test our hypothesis we analyze visitation, climate and social media data in more than 70 National Parks in U.S during the last 3 years. The experimental results reveal a high correlation between social media data and tourism demands; in fact, in over 80\% of the parks, social media reviews and visitation counts are correlated by more than 50\%. Moreover, we assess the effectiveness of employing various prediction techniques, finding that even a simple linear regression model, when fed with social media and climate data as input features, can attain a prediction accuracy of over 80\% while a more robust algorithm, such as Support Vector Regression, reaches up to 94\% accuracy.


Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification

arXiv.org Artificial Intelligence

Machine Learning has attracted considerable attention throughout the past decade due to its potential to solve far-reaching tasks, such as image classification, object recognition, anomaly detection, and data forecasting. A standard approach to tackle such applications is based on supervised learning, which is assisted by large sets of labeled data and is conducted by the so-called classifiers, such as Logistic Regression, Decision Trees, Random Forests, and Support Vector Machines, among others. An alternative to traditional classifiers is the parameterless Optimum-Path Forest (OPF), which uses a graph-based methodology and a distance measure to create arcs between nodes and hence sets of trees, responsible for conquering the nodes, defining their labels, and shaping the forests. Nevertheless, its performance is strongly associated with an appropriate distance measure, which may vary according to the dataset's nature. Therefore, this work proposes a comparative study over a wide range of distance measures applied to the supervised Optimum-Path Forest classification. The experimental results are conducted using well-known literature datasets and compared across benchmarking classifiers, illustrating OPF's ability to adapt to distinct domains.


Fourier Representations for Black-Box Optimization over Categorical Variables

arXiv.org Artificial Intelligence

Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone search algorithms, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such representations, we consider two different settings to update our surrogate model. First, we utilize an adversarial online regression setting where Fourier characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Second, we consider a Bayesian setting where queries are selected via Thompson sampling and the posterior is updated via a sparse Bayesian regression model (over our proposed representation) with a regularized horseshoe prior. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computation cost and/or sample efficiency, substantially.


Distribution Regression with Sliced Wasserstein Kernels

arXiv.org Machine Learning

The problem of learning functions over spaces of probabilities - or distribution regression - is gaining significant interest in the machine learning community. A key challenge behind this problem is to identify a suitable representation capturing all relevant properties of the underlying functional mapping. A principled approach to distribution regression is provided by kernel mean embeddings, which lifts kernel-induced similarity on the input domain at the probability level. This strategy effectively tackles the two-stage sampling nature of the problem, enabling one to derive estimators with strong statistical guarantees, such as universal consistency and excess risk bounds. However, kernel mean embeddings implicitly hinge on the maximum mean discrepancy (MMD), a metric on probabilities, which may fail to capture key geometrical relations between distributions. In contrast, optimal transport (OT) metrics, are potentially more appealing, as documented by the recent literature on the topic. In this work, we propose the first OT-based estimator for distribution regression. We build on the Sliced Wasserstein distance to obtain an OT-based representation. We study the theoretical properties of a kernel ridge regression estimator based on such representation, for which we prove universal consistency and excess risk bounds. Preliminary experiments complement our theoretical findings by showing the effectiveness of the proposed approach and compare it with MMD-based estimators.


Optimal Transport of Binary Classifiers to Fairness

arXiv.org Machine Learning

Much of the past work on fairness in machine learning has focused on forcing the predictions of classifiers to have similar statistical properties for individuals of different demographics. Yet, such methods often simply perform a rescaling of the classifier scores and ignore whether individuals of different groups have similar features. Our proposed method, Optimal Transport to Fairness (OTF), applies Optimal Transport (OT) to take this similarity into account by quantifying unfairness as the smallest cost of OT between a classifier and any score function that satisfies fairness constraints. For a flexible class of linear fairness constraints, we show a practical way to compute OTF as an unfairness cost term that can be added to any standard classification setting. Experiments show that OTF can be used to achieve an effective trade-off between predictive power and fairness.


Mental Stress Detection using Data from Wearable and Non-wearable Sensors: A Review

arXiv.org Artificial Intelligence

This paper presents a comprehensive review of methods covering significant subjective and objective human stress detection techniques available in the literature. The methods for measuring human stress responses could include subjective questionnaires (developed by psychologists) and objective markers observed using data from wearable and non-wearable sensors. In particular, wearable sensor-based methods commonly use data from electroencephalography, electrocardiogram, galvanic skin response, electromyography, electrodermal activity, heart rate, heart rate variability, and photoplethysmography both individually and in multimodal fusion strategies. Whereas, methods based on non-wearable sensors include strategies such as analyzing pupil dilation and speech, smartphone data, eye movement, body posture, and thermal imaging. Whenever a stressful situation is encountered by an individual, physiological, physical, or behavioral changes are induced which help in coping with the challenge at hand. A wide range of studies has attempted to establish a relationship between these stressful situations and the response of human beings by using different kinds of psychological, physiological, physical, and behavioral measures. Inspired by the lack of availability of a definitive verdict about the relationship of human stress with these different kinds of markers, a detailed survey about human stress detection methods is conducted in this paper. In particular, we explore how stress detection methods can benefit from artificial intelligence utilizing relevant data from various sources. This review will prove to be a reference document that would provide guidelines for future research enabling effective detection of human stress conditions.


Optimal Ratio for Data Splitting

arXiv.org Machine Learning

It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is $\sqrt{p}:1$, where $p$ is the number of parameters in a linear regression model that explains the data well.


HARFE: Hard-Ridge Random Feature Expansion

arXiv.org Machine Learning

We propose a random feature model for approximating high-dimensional sparse additive functions called the hard-ridge random feature expansion method (HARFE). This method utilizes a hard-thresholding pursuit-based algorithm applied to the sparse ridge regression (SRR) problem to approximate the coefficients with respect to the random feature matrix. The SRR formulation balances between obtaining sparse models that use fewer terms in their representation and ridge-based smoothing that tend to be robust to noise and outliers. In addition, we use a random sparse connectivity pattern in the random feature matrix to match the additive function assumption. We prove that the HARFE method is guaranteed to converge with a given error bound depending on the noise and the parameters of the sparse ridge regression model. Based on numerical results on synthetic data as well as on real datasets, the HARFE approach obtains lower (or comparable) error than other state-of-the-art algorithms.


A new similarity measure for covariate shift with applications to nonparametric regression

arXiv.org Machine Learning

In the standard formulation of prediction or classification, future data (as represented by a test set) is assumed to be drawn from the same distribution as the training data. This assumption, while theoretically convenient, may fail to hold in many real-world scenarios. For instance, training data might be collected only from a sub-group within a broader population (such as in medical trials), or the environment might change over time as data are collected. Such scenarios result in a distribution mismatch between the training and test data. In this paper, we study an important case of such distribution mismatch--namely, the covariate shift problem (e.g., [21, 19]). Suppose that a statistician observes covariate-response pairs (X, Y), and wishes to build a prediction rule. In the problem of covariate shift, the distribution of the covariates X is allowed to change between the training and test data, while the posterior distribution of the responses (namely, Y X) remains fixed. Compared to the usual i.i.d.


Efficient Logistic Regression with Local Differential Privacy

arXiv.org Machine Learning

Internet of Things devices are expanding rapidly and generating huge amount of data. There is an increasing need to explore data collected from these devices. Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy. In recent years, large amount of privacy preserving techniques have been developed based on differential privacy and secure multi-party computation. A major challenge of collaborative learning is to balance disclosure risk and data utility while maintaining high computation efficiency. In this paper, we proposed privacy preserving logistic regression model using matrix encryption approach. The secure scheme achieves local differential privacy and can be implemented for both vertical and horizontal partitioning scenarios. Moreover, cross validation is investigated to generate robust model results without increasing the communication cost. Simulation illustrates the high efficiency of proposed scheme to analyze dataset with millions of records. Experimental evaluations further demonstrate high model accuracy while achieving privacy protection.