AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Characterizing, Detecting, and Predicting Online Ban Evasion

Niverthi, Manoj, Verma, Gaurav, Kumar, Srijan

arXiv.org Artificial IntelligenceFeb-10-2022

Moderators and automated methods enforce bans on malicious users who engage in disruptive behavior. However, malicious users can easily create a new account to evade such bans. Previous research has focused on other forms of online deception, like the simultaneous operation of multiple accounts by the same entities (sockpuppetry), impersonation of other individuals, and studying the effects of de-platforming individuals and communities. Here we conduct the first data-driven study of ban evasion, i.e., the act of circumventing bans on an online platform, leading to temporally disjoint operation of accounts by the same user. We curate a novel dataset of 8,551 ban evasion pairs (parent, child) identified on Wikipedia and contrast their behavior with benign users and non-evading malicious users. We find that evasion child accounts demonstrate similarities with respect to their banned parent accounts on several behavioral axes - from similarity in usernames and edited pages to similarity in content added to the platform and its psycholinguistic attributes. We reveal key behavioral attributes of accounts that are likely to evade bans. Based on the insights from the analyses, we train logistic regression classifiers to detect and predict ban evasion at three different points in the ban evasion lifecycle. Results demonstrate the effectiveness of our methods in predicting future evaders (AUC = 0.78), early detection of ban evasion (AUC = 0.85), and matching child accounts with parent accounts (MRR = 0.97). Our work can aid moderators by reducing their workload and identifying evasion pairs faster and more efficiently than current manual and heuristic-based approaches. Dataset is available https://github.com/srijankr/ban_evasion.

child account, evasion pair, parent account, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3485447.3512133

2202.05257

Country:

Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre:

Research Report > New Finding (0.86)
Research Report > Experimental Study (0.66)

Industry:

Media (0.94)
Law Enforcement & Public Safety (0.93)
Information Technology > Services (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Posterior Consistency for Bayesian Relevance Vector Machines

Fang, Xiao, Ghosh, Malay

arXiv.org Machine LearningFeb-10-2022

Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. Chakraborty et al. (2012) did a full hierarchical Bayesian analysis of nonlinear regression in such situations using relevance vector machines based on reproducing kernel Hilbert space (RKHS). But they did not provide any theoretical properties associated with their procedure. The present paper revisits their problem, introduces a new class of global-local priors different from theirs, and provides results on posterior consistency as well as posterior contraction rates.

kernel, posterior consistency, relevance vector machine, (10 more...)

arXiv.org Machine Learning

2202.05422

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging

Angelopoulos, Anastasios N, Kohli, Amit P, Bates, Stephen, Jordan, Michael I, Malik, Jitendra, Alshaabi, Thayer, Upadhyayula, Srigokul, Romano, Yaniv

arXiv.org Machine LearningFeb-10-2022

Image-to-image regression is an important learning task, used frequently in biological imaging. Current algorithms, however, do not generally offer statistical guarantees that protect against a model's mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. In particular, we show how to derive uncertainty intervals around each pixel that are guaranteed to contain the true value with a user-specified confidence probability. Our methods work in conjunction with any base machine learning model, such as a neural network, and endow it with formal mathematical guarantees -- regardless of the true unknown data distribution or choice of model. Furthermore, they are simple to implement and computationally inexpensive. We evaluate our procedure on three image-to-image regression tasks: quantitative phase microscopy, accelerated magnetic resonance imaging, and super-resolution transmission electron microscopy of a Drosophila melanogaster brain.

prediction, quantile regression, regression, (17 more...)

arXiv.org Machine Learning

2202.05265

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Adaptive and Robust Multi-task Learning

Duan, Yaqi, Wang, Kaizheng

arXiv.org Machine LearningFeb-10-2022

Multi-task learning (MTL) solves a number of learning tasks simultaneously. It has become increasingly popular in modern applications with data generated by multiple sources. When the tasks share certain common structures, a properly chosen MTL algorithm can leverage that to improve the performance. However, task relatedness is usually unknown and hard to quantify in practice; heterogeneity can even make multi-task approaches perform worse than single-task learning, which trains models separately on their individual datasets. In this paper, we study MTL from a statistical perspective and develop a family of reliable approaches that adapt to the unknown task relatedness and are robust against outlier tasks with possibly contaminated data.

armul, learning, relatedness, (15 more...)

arXiv.org Machine Learning

2202.0525

Country:

North America > United States > New York (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Transfer-Learning Across Datasets with Different Input Dimensions: An Algorithm and Analysis for the Linear Regression Case

Silvestrin, Luis Pedro, van Zanten, Harry, Hoogendoorn, Mark, Koole, Ger

arXiv.org Machine LearningFeb-10-2022

With the development of new sensors and monitoring devices, more sources of data become available to be used as inputs for machine learning models. These can on the one hand help to improve the accuracy of a model. On the other hand however, combining these new inputs with historical data remains a challenge that has not yet been studied in enough detail. In this work, we propose a transfer-learning algorithm that combines the new and the historical data, that is especially beneficial when the new data is scarce. We focus the approach on the linear regression case, which allows us to conduct a rigorous theoretical study on the benefits of the approach. We show that our approach is robust against negative transfer-learning, and we confirm this result empirically with real and simulated data.

dataset, estimator, transfer gain, (12 more...)

arXiv.org Machine Learning

2202.05069

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.69)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.84)

Add feedback

Random Forests Weighted Local Fr\'echet Regression with Theoretical Guarantee

Qiu, Rui, Yu, Zhou, Zhu, Ruoqing

arXiv.org Machine LearningFeb-10-2022

Statistical analysis is increasingly confronted with complex data from general metric spaces, such as symmetric positive definite matrix-valued data and probability distribution functions. [47] and [17] establish a general paradigm of Fr\'echet regression with complex metric space valued responses and Euclidean predictors. However, their proposed local Fr\'echet regression approach involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forests weighted local Fr\'echet regression paradigm. The main mechanism of our approach relies on the adaptive kernels generated by random forests. Our first method utilizes these weights as the local average to solve the Fr\'echet mean, while the second method performs local linear Fr\'echet regression, making both methods locally adaptive. Our proposals significantly improve existing Fr\'echet regression methods. Based on the theory of infinite order U-processes and infinite order Mmn-estimator, we establish the consistency, rate of convergence, and asymptotic normality for our proposed random forests weighted Fr\'echet regression estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our proposed two methods for Fr\'echet regression with several commonly encountered types of responses such as probability distribution functions, symmetric positive definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to the human mortality distribution data.

chet regression, estimator, random forest, (14 more...)

arXiv.org Machine Learning

2202.04912

Country:

North America > United States > Illinois (0.04)
Asia > China (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Banking & Finance > Economy (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Zhang

AAAI ConferencesFeb-8-2022, 11:01:10 GMT

In this paper we present a novel agent-based modeling methodology to predict rooftop solar adoptions in the residential energy market. We first applied several linear regression models to estimate missing variables for non-adopters, so that attributes of non-adopters and adopters could be used to train a logistic regression model. Then, we integrated the logistic regression model along with other predictive models into a multi-agent simulation platform and validated our models by comparing the forecast of aggregate adoptions in a typical zip code area with its ground truth. This result shows that the agent-based model can reliably predict future adoptions. Finally, based on the validated agent-based model, we compared the outcome of a hypothesized seeding policy with the original incentive plan, and investigated other alternative seeding policies which could lead to more adopters.

adoption, logistic regression model, regression model, (3 more...)

AAAI Conferences

Genre: Research Report > New Finding (0.95)

Industry: Energy (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Liu

AAAI ConferencesFeb-8-2022, 10:17:07 GMT

Privacy policies are commonly used to inform users about the data collection and use practices of websites, mobile apps, and other products and services. However, the average Internet user struggles to understand the contents of these documents and generally does not read them. Natural language and machine learning techniques offer the promise of automatically extracting relevant statements from privacy policies to help generate succinct summaries, but current techniques require large amounts of annotated data. The highest quality annotations require law experts, but their efforts do not scale efficiently. In this paper, we present results on bridging the gap between privacy practice categories defined by law experts with topics learned from Non-negative Matrix Factorization (NMF). To do this, we investigate the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary sets identified by topic modeling. The intersections exhibit strong matches between some categories and topics, although other categories have weaker affinities with topics. Our results show a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.

category, law expert, practice category, (1 more...)

AAAI Conferences

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.64)

Add feedback

Vosoughi

AAAI ConferencesFeb-8-2022, 10:09:24 GMT

Speech acts are a way to conceptualize speech as action. This holds true for communication on any platform, including social media platforms such as Twitter.

machine learning, social media, vosoughi, (3 more...)

AAAI Conferences

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback

Dave

AAAI ConferencesFeb-8-2022, 10:02:09 GMT

In the recent years, reciprocal link prediction has received some attention from the data mining and social network analysis researchers, who solved this problem as a binary classification task. However, it is also important to predict the interval time for the creation of reciprocal link. This is a challenging problem for two reasons: First, the lack of effective features, because well-known link prediction features are designed for undirected networks and for the binary classification task, hence they do not work well for the interval time prediction; Second, the presence of censored data instances makes the traditional supervised regression methods unsuitable for solving this problem. In this paper, we propose a solution for the reciprocal link interval time prediction task. We map this problem into survival analysis framework and show through extensive experiments on real-world datasets that, survival analysis methods perform better than traditional regression, neural network based model and support vector regression (SVR).

binary classification task, prediction, regression

AAAI Conferences

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback