AITopics

2406.11206

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceJun-7-2024

Perturb-and-Project: Differentially Private Similarities and Marginals

Cohen-Addad, Vincent, d'Orsi, Tommaso, Epasto, Alessandro, Mirrokni, Vahab, Zhong, Peilin

We revisit the input perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$. Through this framework, we first design novel efficient algorithms to privately release pair-wise cosine similarities. Second, we derive a novel algorithm to compute $k$-way marginal queries over $n$ features. Prior work could achieve comparable guarantees only for $k$ even. Furthermore, we extend our results to $t$-sparse datasets, where our efficient algorithms yields novel, stronger guarantees whenever $t\le n^{5/6}/\log n\,.$ Finally, we provide a theoretical perspective on why \textit{fast} input perturbation algorithms works well in practice. The key technical ingredients behind our results are tight sum-of-squares certificates upper bounding the Gaussian complexity of sets of solutions.

artificial intelligence, data mining, machine learning, (17 more...)

2406.04868

Country:

North America > United States (0.28)
Europe (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Artificial IntelligenceFeb-9-2024

A Scalable Algorithm for Individually Fair K-means Clustering

Bateni, MohammadHossein, Cohen-Addad, Vincent, Epasto, Alessandro, Lattanzi, Silvio

We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $\delta(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in ~$O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.

artificial intelligence, data mining, machine learning, (17 more...)

2402.0673

Country:

Europe (0.93)
North America > United States > Colorado (0.28)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceJan-7-2024

Differentially Private Clustering in Data Streams

Epasto, Alessandro, Mukherjee, Tamalika, Zhong, Peilin

The streaming model is an abstraction of computing over massive data streams, which is a popular way of dealing with large-scale modern data analysis. In this model, there is a stream of data points, one after the other. A streaming algorithm is only allowed one pass over the data stream, and the goal is to perform some analysis during the stream while using as small space as possible. Clustering problems (such as $k$-means and $k$-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms are not applicable in many scenarios. In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k,d,\log(T))$ space to achieve a constant multiplicative error and a $poly(k,d,\log(T))$ additive error. In particular, we present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. By plugging in existing results from DP clustering Ghazi, Kumar, Manurangsi 2020 and Kaplan, Stemmer 2018, we achieve (1) a $(1+\gamma)$-multiplicative approximation with $\tilde{O}_\gamma(poly(k,d,\log(T)))$ space for any $\gamma>0$, and the additive error is $poly(k,d,\log(T))$ or (2) an $O(1)$-multiplicative approximation with $\tilde{O}(k^{1.5} \cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error. In addition, our algorithmic framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.

algorithm, artificial intelligence, machine learning, (17 more...)

2307.07449

Country:

North America > United States > Texas (0.28)
North America > United States > Maryland (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningNov-15-2023

Private estimation algorithms for stochastic block models and mixture models

Chen, Hongjie, Cohen-Addad, Vincent, d'Orsi, Tommaso, Epasto, Alessandro, Imola, Jacob, Steurer, David, Tiegel, Stefan

We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. To illustrate our techniques, we consider two problems: recovery of stochastic block models and learning mixtures of spherical Gaussians. For the former, we present the first efficient $(\epsilon, \delta)$-differentially private algorithm for both weak recovery and exact recovery. Previously known algorithms achieving comparable guarantees required quasi-polynomial time. For the latter, we design an $(\epsilon, \delta)$-differentially private algorithm that recovers the centers of the $k$-mixture when the minimum separation is at least $ O(k^{1/t}\sqrt{t})$. For all choices of $t$, this algorithm requires sample complexity $n\geq k^{O(1)}d^{O(t)}$ and time complexity $(nd)^{O(t)}$. Prior work required minimum separation at least $O(\sqrt{k})$ as well as an explicit upper bound on the Euclidean norm of the centers.

algorithm, artificial intelligence, machine learning, (18 more...)

2301.04822

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Workflow (0.69)
Research Report (0.63)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.92)

arXiv.org Artificial IntelligenceJul-31-2023

Measuring Re-identification Risk

Carey, CJ, Dick, Travis, Epasto, Alessandro, Javanmard, Adel, Karlin, Josh, Kumar, Shankar, Medina, Andres Munoz, Mirrokni, Vahab, Nunes, Gabriel Henrique, Vassilvitskii, Sergei, Zhong, Peilin

In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

data mining, machine learning, natural language, (20 more...)

2304.0721

Country:

Asia (0.28)
North America > United States > California (0.14)
Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Artificial IntelligenceMay-23-2023

Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees

Imola, Jacob, Epasto, Alessandro, Mahdian, Mohammad, Cohen-Addad, Vincent, Mirrokni, Vahab

Hierarchical Clustering is a popular unsupervised machine learning method with decades of history and numerous applications. We initiate the study of differentially private approximation algorithms for hierarchical clustering under the rigorous framework introduced by (Dasgupta, 2016). We show strong lower bounds for the problem: that any $\epsilon$-DP algorithm must exhibit $O(|V|^2/ \epsilon)$-additive error for an input dataset $V$. Then, we exhibit a polynomial-time approximation algorithm with $O(|V|^{2.5}/ \epsilon)$-additive error, and an exponential-time algorithm that meets the lower bound. To overcome the lower bound, we focus on the stochastic block model, a popular model of graphs, and, with a separation assumption on the blocks, propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly. Finally, we perform an empirical study of our algorithms and validate their performance.

algorithm, artificial intelligence, machine learning, (18 more...)

2302.00037

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningJun-18-2020

Fair Hierarchical Clustering

Ahmadian, Sara, Epasto, Alessandro, Knittel, Marina, Kumar, Ravi, Mahdian, Mohammad, Moseley, Benjamin, Pham, Philip, Vassilvitskii, Sergei, Wang, Yuyan

As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.

artificial intelligence, machine learning, vertex, (15 more...)

2006.10221

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningMay-6-2019

Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts

Epasto, Alessandro, Perozzi, Bryan

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph -- a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to $90\%$. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

graph, health & medicine, neural network, (21 more...)

doi: 10.1145/3308558.3313660

1905.02138

Country:

North America > United States > New York (0.28)
North America > United States > California > San Francisco County > San Francisco (0.16)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.49)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

AAAI ConferencesApr-19-2016

Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic

Stefani, Lorenzo De (Brown University) | Epasto, Alessandro (Brown University) | Upfal, Eli (Brown University) | Vandin, Fabio (Univeristy of Padova)

We study the problem of learning probabilistic models for permutations, where the order between highly ranked items in the observed permutations is more reliable (i.e., consistent in different rankings) than the order between lower ranked items, a typical phenomena observed in many applications such as web search results and product ranking. We introduce and study a variant of the Mallows model where the distribution is a function of the widely used Average-Precision (AP) Correlation statistic, instead of the standard Kendall’s tau distance. We present a generative model for constructing samples from this distribution and prove useful properties of that distribution. Using these properties we develop an efficient algorithm that provably computes an asymptotically unbiased estimate of the center permutation, and a faster algorithm that learns with high probability the hidden central permutation for a wide range of the parameters of the model. We complement our theoretical analysis with extensive experiments showing that unsupervised methods based on our model can precisely identify ground-truth clusters of rankings in real-world data. In particular, when compared to the Kendall’s tau based methods, our methods are less affected by noise in low-rank items.

artificial intelligence, health & medicine, permutation, (20 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)