AITopics | Feyisetan, Oluwaseyi

Collaborating Authors

Feyisetan, Oluwaseyi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Removing Spurious Correlation from Neural Network Interpretations

Fotouhi, Milad, Bahadori, Mohammad Taha, Feyisetan, Oluwaseyi, Arabshahi, Payman, Heckerman, David

arXiv.org Artificial IntelligenceDec-3-2024

The existing algorithms for identification of neurons responsible for undesired and harmful behaviors do not consider the effects of confounders such as topic of the conversation. In this work, we show that confounders can create spurious correlations and propose a new causal mediation approach that controls the impact of the topic. In experiments with two large language models, we study the localization hypothesis and show that adjusting for the effect of conversation topic, toxicity becomes less localized.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.02893

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.47)
Law (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)

Add feedback

A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric

Xu, Zekun, Aggarwal, Abhinav, Feyisetan, Oluwaseyi, Teissier, Nathanael

arXiv.org Machine LearningOct-22-2020

Balancing the privacy-utility tradeoff is a crucial requirement of many practical machine learning systems that deal with sensitive customer data. A popular approach for privacy-preserving text analysis is noise injection, in which text data is first mapped into a continuous embedding space, perturbed by sampling a spherical noise from an appropriate distribution, and then projected back to the discrete vocabulary space. While this allows the perturbation to admit the required metric differential privacy, often the utility of downstream tasks modeled on this perturbed data is low because the spherical noise does not account for the variability in the density around different words in the embedding space. In particular, words in a sparse region are likely unchanged even when the noise scale is large. %Using the global sensitivity of the mechanism can potentially add too much noise to the words in the dense regions of the embedding space, causing a high utility loss, whereas using local sensitivity can leak information through the scale of the noise added. In this paper, we propose a text perturbation mechanism based on a carefully designed regularized variant of the Mahalanobis metric to overcome this problem. For any given noise scale, this metric adds an elliptical noise to account for the covariance structure in the embedding space. This heterogeneity in the noise scale along different directions helps ensure that the words in the sparse region have sufficient likelihood of replacement without sacrificing the overall utility. We provide a text-perturbation algorithm based on this metric and formally prove its privacy guarantees. Additionally, we empirically show that our mechanism improves the privacy statistics to achieve the same level of utility as compared to the state-of-the-art Laplace mechanism.

health & medicine, mechanism, survey article, (20 more...)

arXiv.org Machine Learning

2010.11947

Country: North America > United States (0.47)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Differentially Private Adversarial Robustness Through Randomized Perturbations

Xu, Nan, Feyisetan, Oluwaseyi, Aggarwal, Abhinav, Xu, Zekun, Teissier, Nathanael

arXiv.org Machine LearningSep-26-2020

Deep Neural Networks, despite their great success in diverse domains, are provably sensitive to small perturbations on correctly classified examples and lead to erroneous predictions. Recently, it was proposed that this behavior can be combatted by optimizing the worst case loss function over all possible substitutions of training examples. However, this can be prone to weighing unlikely substitutions higher, limiting the accuracy gain. In this paper, we study adversarial robustness through randomized perturbations, which has two immediate advantages: (1) by ensuring that substitution likelihood is weighted by the proximity to the original word, we circumvent optimizing the worst case guarantees and achieve performance gains; and (2) the calibrated randomness imparts differentially-private model training, which additionally improves robustness against adversarial attacks on the model outputs. Our approach uses a novel density-based mechanism based on truncated Gumbel noise, which ensures training on substitutions of both rare and dense words in the vocabulary while maintaining semantic similarity for model robustness.

deep learning, mechanism, neural network, (18 more...)

arXiv.org Machine Learning

2009.12718

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

On Primes, Log-Loss Scores and (No) Privacy

Aggarwal, Abhinav, Xu, Zekun, Feyisetan, Oluwaseyi, Teissier, Nathanael

arXiv.org Machine LearningSep-17-2020

Membership Inference Attacks exploit the vulnerabilities of exposing models trained on customer data to queries by an adversary. In a recently proposed implementation of an auditing tool for measuring privacy leakage from sensitive datasets, more refined aggregates like the Log-Loss scores are exposed for simulating inference attacks as well as to assess the total privacy leakage based on the adversary's predictions. In this paper, we prove that this additional information enables the adversary to infer the membership of any number of datapoints with full accuracy in a single query, causing complete membership privacy breach. Our approach obviates any attack model training or access to side knowledge with the adversary. Moreover, our algorithms are agnostic to the model under attack and hence, enable perfect membership inference even for models that do not memorize or overfit. In particular, our observations provide insight into the extent of information leakage from statistical aggregates and how they can be exploited.

artificial intelligence, health & medicine, inference, (18 more...)

arXiv.org Machine Learning

2009.08559

Country: North America > United States (0.29)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text

Feyisetan, Oluwaseyi, Diethe, Tom, Drake, Thomas

arXiv.org Machine LearningOct-20-2019

Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying dx-privacy, then we define a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic generalization. We conduct a series of experiments to demonstrate the tradeoff between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe > 20x greater guarantees on expected privacy against comparable worst case statistics.

neural network, null, survey article, (23 more...)

arXiv.org Machine Learning

1910.08917

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)

Add feedback

Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

Feyisetan, Oluwaseyi, Balle, Borja, Drake, Thomas, Diethe, Tom

arXiv.org Machine LearningOct-20-2019

Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of dx-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as defined by word embedding models. We present a privacy proof that satisfies dx-privacy where the privacy parameter epsilon provides guarantees with respect to a distance metric defined by the word embedding space. We demonstrate how epsilon can be selected by analyzing plausible deniability statistics backed up by large scale analysis on GloVe and fastText embeddings. We conduct privacy audit experiments against 2 baseline models and utility experiments on 3 datasets to demonstrate the tradeoff between privacy and utility for varying values of epsilon on different task types. Our results demonstrate practical utility (< 2% utility loss for training binary classifiers) while providing better privacy guarantees than baseline models.

deep learning, mechanism, neural network, (22 more...)

arXiv.org Machine Learning

1910.08902

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Sports > Hockey (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

Feyisetan, Oluwaseyi, Drake, Thomas, Balle, Borja, Diethe, Tom

arXiv.org Machine LearningMar-26-2019

Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. We evaluate our approach by showing the tradeoff between privacy, utility and annotation budget on a binary classification task in a active learning setting.

artificial intelligence, big data, query, (19 more...)

arXiv.org Machine Learning

1903.11112

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.62)

Add feedback