AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Khatibi

AAAI ConferencesFeb-8-2022, 09:58:39 GMT

Accurate predictions about future events is essential in many areas, one of them being the Tourism Industry. Usually, countries and cities invest a huge amount of money in planning and preparation in order to welcome (and profit from) tourists. An accurate prediction of the number of visits in the following days or months could help both the economy and tourists. Prior studies in this domain explore forecasting for a whole country rather than for fine-grained areas within a country (e.g., specific touristic attractions). In this work, we suggest that accessible data from online social networks and travel websites, in addition to climate data, can be used to support the inference of visitation count for many touristic attractions. To test our hypothesis we analyze visitation, climate and social media data in more than 70 National Parks in U.S during the last 3 years. The experimental results reveal a high correlation between social media data and tourism demands; in fact, in over 80\% of the parks, social media reviews and visitation counts are correlated by more than 50\%. Moreover, we assess the effectiveness of employing various prediction techniques, finding that even a simple linear regression model, when fed with social media and climate data as input features, can attain a prediction accuracy of over 80\% while a more robust algorithm, such as Support Vector Regression, reaches up to 94\% accuracy.

khatibi, social media data, touristic attraction, (4 more...)

AAAI Conferences

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.62)

Add feedback

Comparative Study Between Distance Measures On Supervised Optimum-Path Forest Classification

de Rosa, Gustavo Henrique, Roder, Mateus, Papa, João Paulo

arXiv.org Artificial IntelligenceFeb-8-2022

Machine Learning has attracted considerable attention throughout the past decade due to its potential to solve far-reaching tasks, such as image classification, object recognition, anomaly detection, and data forecasting. A standard approach to tackle such applications is based on supervised learning, which is assisted by large sets of labeled data and is conducted by the so-called classifiers, such as Logistic Regression, Decision Trees, Random Forests, and Support Vector Machines, among others. An alternative to traditional classifiers is the parameterless Optimum-Path Forest (OPF), which uses a graph-based methodology and a distance measure to create arcs between nodes and hence sets of trees, responsible for conquering the nodes, defining their labels, and shaping the forests. Nevertheless, its performance is strongly associated with an appropriate distance measure, which may vary according to the dataset's nature. Therefore, this work proposes a comparative study over a wide range of distance measures applied to the supervised Optimum-Path Forest classification. The experimental results are conducted using well-known literature datasets and compared across benchmarking classifiers, illustrating OPF's ability to adapt to distinct domains.

classifier, dataset, distance measure, (12 more...)

arXiv.org Artificial Intelligence

2202.03854

Country:

South America > Brazil > São Paulo (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Fourier Representations for Black-Box Optimization over Categorical Variables

Dadkhahi, Hamid, Rios, Jesus, Shanmugam, Karthikeyan, Das, Payel

arXiv.org Artificial IntelligenceFeb-8-2022

Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone search algorithms, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such representations, we consider two different settings to update our surrogate model. First, we utilize an adversarial online regression setting where Fourier characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Second, we consider a Bayesian setting where queries are selected via Thompson sampling and the posterior is updated via a sparse Bayesian regression model (over our proposed representation) with a regularized horseshoe prior. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computation cost and/or sample efficiency, substantially.

algorithm, evaluation, representation, (16 more...)

arXiv.org Artificial Intelligence

2202.03712

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Distribution Regression with Sliced Wasserstein Kernels

Meunier, Dimitri, Pontil, Massimiliano, Ciliberto, Carlo

arXiv.org Machine LearningFeb-8-2022

The problem of learning functions over spaces of probabilities - or distribution regression - is gaining significant interest in the machine learning community. A key challenge behind this problem is to identify a suitable representation capturing all relevant properties of the underlying functional mapping. A principled approach to distribution regression is provided by kernel mean embeddings, which lifts kernel-induced similarity on the input domain at the probability level. This strategy effectively tackles the two-stage sampling nature of the problem, enabling one to derive estimators with strong statistical guarantees, such as universal consistency and excess risk bounds. However, kernel mean embeddings implicitly hinge on the maximum mean discrepancy (MMD), a metric on probabilities, which may fail to capture key geometrical relations between distributions. In contrast, optimal transport (OT) metrics, are potentially more appealing, as documented by the recent literature on the topic. In this work, we propose the first OT-based estimator for distribution regression. We build on the Sliced Wasserstein distance to obtain an OT-based representation. We study the theoretical properties of a kernel ridge regression estimator based on such representation, for which we prove universal consistency and excess risk bounds. Preliminary experiments complement our theoretical findings by showing the effectiveness of the proposed approach and compare it with MMD-based estimators.

distribution regression, kernel, sw 1, (14 more...)

arXiv.org Machine Learning

2202.03926

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Optimal Transport of Binary Classifiers to Fairness

Buyl, Maarten, De Bie, Tijl

arXiv.org Machine LearningFeb-8-2022

Much of the past work on fairness in machine learning has focused on forcing the predictions of classifiers to have similar statistical properties for individuals of different demographics. Yet, such methods often simply perform a rescaling of the classifier scores and ignore whether individuals of different groups have similar features. Our proposed method, Optimal Transport to Fairness (OTF), applies Optimal Transport (OT) to take this similarity into account by quantifying unfairness as the smallest cost of OT between a classifier and any score function that satisfies fairness constraints. For a flexible class of linear fairness constraints, we show a practical way to compute OTF as an unfairness cost term that can be added to any standard classification setting. Experiments show that OTF can be used to achieve an effective trade-off between predictive power and fairness.

constraint, fairness notion, optimal transport, (13 more...)

arXiv.org Machine Learning

2202.03814

Country:

North America > United States > West Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Belgium (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Mental Stress Detection using Data from Wearable and Non-wearable Sensors: A Review

Arsalan, Aamir, Anwar, Syed Muhammad, Majid, Muhammad

arXiv.org Artificial IntelligenceFeb-7-2022

This paper presents a comprehensive review of methods covering significant subjective and objective human stress detection techniques available in the literature. The methods for measuring human stress responses could include subjective questionnaires (developed by psychologists) and objective markers observed using data from wearable and non-wearable sensors. In particular, wearable sensor-based methods commonly use data from electroencephalography, electrocardiogram, galvanic skin response, electromyography, electrodermal activity, heart rate, heart rate variability, and photoplethysmography both individually and in multimodal fusion strategies. Whereas, methods based on non-wearable sensors include strategies such as analyzing pupil dilation and speech, smartphone data, eye movement, body posture, and thermal imaging. Whenever a stressful situation is encountered by an individual, physiological, physical, or behavioral changes are induced which help in coping with the challenge at hand. A wide range of studies has attempted to establish a relationship between these stressful situations and the response of human beings by using different kinds of psychological, physiological, physical, and behavioral measures. Inspired by the lack of availability of a definitive verdict about the relationship of human stress with these different kinds of markers, a detailed survey about human stress detection methods is conducted in this paper. In particular, we explore how stress detection methods can benefit from artificial intelligence utilizing relevant data from various sources. This review will prove to be a reference document that would provide guidelines for future research enabling effective detection of human stress conditions.

affective computing and intelligent interaction, out-of-lab environment, physiological measure, (15 more...)

arXiv.org Artificial Intelligence

2202.03033

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)
North America > Canada > Quebec > Montreal (0.04)
Europe > Latvia > Riga Municipality > Riga (0.04)
(15 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Hardware (1.00)
Information Technology > Communications > Networks > Sensor Networks (1.00)
(11 more...)

Add feedback

Optimal Ratio for Data Splitting

Joseph, V. Roshan

arXiv.org Machine LearningFeb-7-2022

It is common to split a dataset into training and testing sets before fitting a statistical or machine learning model. However, there is no clear guidance on how much data should be used for training and testing. In this article we show that the optimal splitting ratio is $\sqrt{p}:1$, where $p$ is the number of parameters in a linear regression model that explains the data well.

data splitting, optimal ratio

arXiv.org Machine Learning

2202.03326

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Add feedback

HARFE: Hard-Ridge Random Feature Expansion

Saha, Esha, Schaeffer, Hayden, Tran, Giang

arXiv.org Machine LearningFeb-6-2022

We propose a random feature model for approximating high-dimensional sparse additive functions called the hard-ridge random feature expansion method (HARFE). This method utilizes a hard-thresholding pursuit-based algorithm applied to the sparse ridge regression (SRR) problem to approximate the coefficients with respect to the random feature matrix. The SRR formulation balances between obtaining sparse models that use fewer terms in their representation and ridge-based smoothing that tend to be robust to noise and outliers. In addition, we use a random sparse connectivity pattern in the random feature matrix to match the additive function assumption. We prove that the HARFE method is guaranteed to converge with a given error bound depending on the noise and the parameters of the sparse ridge regression model. Based on numerical results on synthetic data as well as on real datasets, the HARFE approach obtains lower (or comparable) error than other state-of-the-art algorithms.

algorithm, dataset, harfe, (13 more...)

arXiv.org Machine Learning

2202.02877

Country: Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

A new similarity measure for covariate shift with applications to nonparametric regression

Pathak, Reese, Ma, Cong, Wainwright, Martin J.

arXiv.org Machine LearningFeb-6-2022

In the standard formulation of prediction or classification, future data (as represented by a test set) is assumed to be drawn from the same distribution as the training data. This assumption, while theoretically convenient, may fail to hold in many real-world scenarios. For instance, training data might be collected only from a sub-group within a broader population (such as in medical trials), or the environment might change over time as data are collected. Such scenarios result in a distribution mismatch between the training and test data. In this paper, we study an important case of such distribution mismatch--namely, the covariate shift problem (e.g., [21, 19]). Suppose that a statistician observes covariate-response pairs (X, Y), and wishes to build a prediction rule. In the problem of covariate shift, the distribution of the covariates X is allowed to change between the training and test data, while the posterior distribution of the responses (namely, Y X) remains fixed. Compared to the usual i.i.d.

covariate shift, similarity measure, transfer exponent, (16 more...)

arXiv.org Machine Learning

2202.02837

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Efficient Logistic Regression with Local Differential Privacy

Miao, Guanhong

arXiv.org Machine LearningFeb-5-2022

Internet of Things devices are expanding rapidly and generating huge amount of data. There is an increasing need to explore data collected from these devices. Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy. In recent years, large amount of privacy preserving techniques have been developed based on differential privacy and secure multi-party computation. A major challenge of collaborative learning is to balance disclosure risk and data utility while maintaining high computation efficiency. In this paper, we proposed privacy preserving logistic regression model using matrix encryption approach. The secure scheme achieves local differential privacy and can be implemented for both vertical and horizontal partitioning scenarios. Moreover, cross validation is investigated to generate robust model results without increasing the communication cost. Simulation illustrates the high efficiency of proposed scheme to analyze dataset with millions of records. Experimental evaluations further demonstrate high model accuracy while achieving privacy protection.

differential privacy, matrix, privacy, (16 more...)

arXiv.org Machine Learning

2202.0265

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.89)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback