influence measure
In-Group Love, Out-Group Hate: A Framework to Measure Affective Polarization via Contentious Online Discussions
Nettasinghe, Buddhika, Rao, Ashwin, Jiang, Bohan, Percus, Allon, Lerman, Kristina
Affective polarization, the emotional divide between ideological groups marked by in-group love and out-group hate, has intensified in the United States, driving contentious issues like masking and lockdowns during the COVID-19 pandemic. Despite its societal impact, existing models of opinion change fail to account for emotional dynamics nor offer methods to quantify affective polarization robustly and in real-time. In this paper, we introduce a discrete choice model that captures decision-making within affectively polarized social networks and propose a statistical inference method estimate key parameters -- in-group love and out-group hate -- from social media data. Through empirical validation from online discussions about the COVID-19 pandemic, we demonstrate that our approach accurately captures real-world polarization dynamics and explains the rapid emergence of a partisan gap in attitudes towards masking and lockdowns. This framework allows for tracking affective polarization across contentious issues has broad implications for fostering constructive online dialogues in digital spaces.
- Europe > France (0.14)
- North America > United States > New York (0.04)
- North America > United States > Iowa > Johnson County > Iowa City (0.04)
- (2 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
On the influence of dependent features in classification problems: a game-theoretic perspective
Davila-Pena, Laura, Saavedra-Nieves, Alejandro, Casas-Méndez, Balbina
Within this framework, we consider a sample of individuals characterized by specific features, each feature encompassing a finite range of values, and classified based on a binary response variable. This measure turns out to be an influence measure explored in existing literature and related to cooperative game theory. We provide an axiomatic characterization of our proposed influence measure by tailoring properties from the cooperative game theory to our specific context. Furthermore, we demonstrate that our influence measure becomes a general characterization of the well-known Banzhaf-Owen value for games with a priori unions, from the perspective of classification problems. The definitions and results presented herein are illustrated through numerical examples and various applications, offering practical insights into our methodologies.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- North America > United States (0.04)
- (2 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area (1.00)
Contrastive Error Attribution for Finetuned Language Models
Ladhak, Faisal, Durmus, Esin, Hashimoto, Tatsunori
Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in NLG datasets. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.93 at detecting known data errors across synthetic tasks with known ground truth, substantially outperforming existing approaches. Using this approach and re-training models on cleaned data leads to a 70% reduction in entity hallucinations on the NYT dataset and a 55% reduction in semantic errors on the E2E dataset.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > China (0.05)
- Europe > United Kingdom > Wales (0.05)
- (11 more...)
An algorithm-based multiple detection influence measure for high dimensional regression using expectile
Barry, Amadou, Bhagwat, Nikhil, Misic, Bratislav, Poline, Jean-Baptiste, Greenwood, Celia M. T.
The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail when there is masking several influential observations with similar characteristics, or swamping when the influential observations are near the boundary of the space spanned by well-behaved observations. Therefore, we propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations that addresses current limitations. Our three-step algorithm to identify and capture undesirable variability in the data, $\asymMIP,$ is based on two complementary statistics, inspired by asymmetric correlations, and built on expectiles. Simulations demonstrate higher detection power than competing methods. Use of the resulting asymptotic distribution leads to detection of influential observations without the need for computationally demanding procedures such as the bootstrap. The application of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset resulted in a more balanced and accurate prediction of brain maturity based on cortical thickness. See our GitHub for a free R package that implements our algorithm: \texttt{asymMIP} (\url{github.com/AmBarry/hidetify}).
- North America > Canada > Quebec (0.29)
- North America > United States > California (0.28)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Autism (0.34)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Data Science > Data Mining (0.69)
Assessment of the influence of features on a classification problem: an application to COVID-19 patients
Davila-Pena, L., García-Jurado, Ignacio, Casas-Méndez, B.
This paper deals with an important subject in classification problems addressed by machine learning techniques: the evaluation of the influence of each of the features on the classification of individuals. Specifically, a measure of that influence is introduced using the Shapley value of cooperative games. In addition, an axiomatic characterisation of the proposed measure is provided based on properties of efficiency and balanced contributions. Furthermore, some experiments have been designed in order to validate the appropriate performance of such measure. Finally, the methodology introduced is applied to a sample of COVID-19 patients to study the influence of certain demographic or risk factors on various events of interest related to the evolution of the disease.
Evidential positive opinion influence measures for viral marketing
Jendoubi, Siwar, Martin, Arnaud
The Viral Marketing is a relatively new form of marketing that exploits social networks to promote a brand, a product, etc. The idea behind it is to find a set of influencers on the network that can trigger a large cascade of propagation and adoptions. In this paper, we will introduce an evidential opinion-based influence maximization model for viral marketing. Besides, our approach tackles three opinions based scenarios for viral marketing in the real world. The first scenario concerns influencers who have a positive opinion about the product. The second scenario deals with influencers who have a positive opinion about the product and produce effects on users who also have a positive opinion. The third scenario involves influence users who have a positive opinion about the product and produce effects on the negative opinion of other users concerning the product in question. Next, we proposed six influence measures, two for each scenario. We also use an influence maximization model that the set of detected influencers for each scenario. Finally, we show the performance of the proposed model with each influence measure through some experiments conducted on a generated dataset and a real world dataset collected from Twitter.
- Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > France (0.04)
Sensitivity Analysis of Deep Neural Networks
Deep neural networks (DNNs) have achieved superior performance in various prediction tasks, but can be very vulnerable to adversarial examples or perturbations. Therefore, it is crucial to measure the sensitivity of DNNs to various forms of perturbations in real applications. We introduce a novel perturbation manifold and its associated influence measure to quantify the effects of various perturbations on DNN classifiers. Such perturbations include various external and internal perturbations to input samples and network parameters. The proposed measure is motivated by information geometry and provides desirable invariance properties. We demonstrate that our influence measure is useful for four model building tasks: detecting potential 'outliers', analyzing the sensitivity of model architectures, comparing network sensitivity between training and test sets, and locating vulnerable areas. Experiments show reasonably good performance of the proposed measure for the popular DNN models ResNet50 and DenseNet121 on CIFAR10 and MNIST datasets.
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > New York (0.04)
- Asia > China > Beijing > Beijing (0.04)
Axiomatic Characterization of Data-Driven Influence Measures for Classification
Sliwinski, Jakub, Strobel, Martin, Zick, Yair
We study the following problem: given a labeled dataset and a specific datapoint x, how did the i-th feature influence the classification for x? We identify a family of numerical influence measures - functions that, given a datapoint x, assign a numeric value phi_i(x) to every feature i, corresponding to how altering i's value would influence the outcome for x. This family, which we term monotone influence measures (MIM), is uniquely derived from a set of desirable properties, or axioms. The MIM family constitutes a provably sound methodology for measuring feature influence in classification domains; the values generated by MIM are based on the dataset alone, and do not make any queries to the classifier. While this requirement naturally limits the scope of our framework, we demonstrate its effectiveness on data.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- (4 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
Hunting for Discriminatory Proxies in Linear Regression Models
Yeom, Samuel, Datta, Anupam, Fredrikson, Matt
A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies in classification models, and characterizes a model's constituent behavior that: 1) correlates closely with a protected random variable, and 2) is causally influential in the overall behavior of the model. We show that proxies in linear regression models can be efficiently identified by solving a second-order cone program, and further extend this result to account for situations where the use of a certain input variable is justified as a "business necessity". Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Connecticut (0.04)
- (2 more...)
Supervising Feature Influence
Sen, Shayak, Mardziel, Piotr, Datta, Anupam, Fredrikson, Matthew
Causal influence measures for machine learnt classifiers shed light on the reasons behind classification, and aid in identifying influential input features and revealing their biases. However, such analyses involve evaluating the classifier using datapoints that may be atypical of its training distribution. Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints. As a result, training to minimize empirical risk does not distinguish among classifiers that agree on predictions in the training distribution but have wildly different causal influences. We term this problem covariate shift in causal testing and formally characterize conditions under which it arises. As a solution to this problem, we propose a novel active learning algorithm that constrains the influence measures of the trained model. We prove that any two predictors whose errors are close on both the original training distribution and the distribution of atypical points are guaranteed to have causal influences that are also close. Further, we empirically demonstrate with synthetic labelers that our algorithm trains models that (i) have similar causal influences as the labeler's model, and (ii) generalize better to out-of-distribution points while (iii) retaining their accuracy on in-distribution points.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Law (0.68)
- Information Technology > Security & Privacy (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)