AITopics | Chierichetti, Flavio

Approximating a RUM from Distributions on k-Slates

Chierichetti, Flavio, Giacchini, Mirko, Kumar, Ravi, Panconesi, Alessandro, Tomkins, Andrew

arXiv.org Artificial IntelligenceMay-22-2023

In this work we consider the problem of fitting Random Utility Models (RUMs) to user choices. Given the winner distributions of the subsets of size $k$ of a universe, we obtain a polynomial-time algorithm that finds the RUM that best approximates the given distribution on average. Our algorithm is based on a linear program that we solve using the ellipsoid method. Given that its corresponding separation oracle problem is NP-hard, we devise an approximate separation oracle that can be viewed as a generalization of the weighted feedback arc set problem to hypergraphs. Our theoretical result can also be made practical: we obtain a heuristic that is effective and scales to real-world datasets.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.13283

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry:

Transportation (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

A Reduction for Efficient LDA Topic Reconstruction

Almanza, Matteo, Chierichetti, Flavio, Panconesi, Alessandro, Vattani, Andrea

Neural Information Processing SystemsFeb-14-2020, 19:57:47 GMT

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from {\em the same set of topics} but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions-- the only ones we can hope to compute in practice-- are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single topic world-- a much simpler task than direct LDA reconstruction.

artificial intelligence, machine learning, reconstruction, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.88)

Add feedback

Mallows Models for Top-k Lists

Chierichetti, Flavio, Dasgupta, Anirban, Haddadan, Shahrzad, Kumar, Ravi, Lattanzi, Silvio

Neural Information Processing SystemsDec-31-2018

The classic Mallows model is a widely-used tool to realize distributions on per- mutations. Motivated by common practical situations, in this paper, we generalize Mallows to model distributions on top-k lists by using a suitable distance measure between top-k lists. Unlike many earlier works, our model is both analytically tractable and computationally efficient. We demonstrate this by studying two basic problems in this model, namely, sampling and reconstruction, from both algorithmic and experimental points of view.

artificial intelligence, machine learning, top-k list, (19 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.68)

Add feedback

A Reduction for Efficient LDA Topic Reconstruction

Almanza, Matteo, Chierichetti, Flavio, Panconesi, Alessandro, Vattani, Andrea

Neural Information Processing SystemsDec-31-2018

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from {\em the same set of topics} but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions-- the only ones we can hope to compute in practice-- are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single topic world-- a much simpler task than direct LDA reconstruction. Indeed, we show the viability of the approach by giving very simple algorithms for a generalization of two notable cases that have been studied in the literature, $p$-separability and Gibbs sampling for matrix-like topics.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

Add feedback

A Reduction for Efficient LDA Topic Reconstruction

Almanza, Matteo, Chierichetti, Flavio, Panconesi, Alessandro, Vattani, Andrea

Neural Information Processing SystemsDec-31-2018

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction. The main technical idea is to show that the distribution over the documents generated by LDA can be transformed into a distribution for a much simpler generative model in which documents are generated from {\em the same set of topics} but have a much simpler structure: documents are single topic and topics are chosen uniformly at random. Furthermore, this reduction is approximation preserving, in the sense that approximate distributions-- the only ones we can hope to compute in practice-- are mapped into approximate distribution in the simplified world. This opens up the possibility of efficiently reconstructing LDA topics in a roundabout way. Compute an approximate document distribution from the given corpus, transform it into an approximate distribution for the single-topic world, and run a reconstruction algorithm in the uniform, single topic world-- a much simpler task than direct LDA reconstruction. Indeed, we show the viability of the approach by giving very simple algorithms for a generalization of two notable cases that have been studied in the literature, $p$-separability and Gibbs sampling for matrix-like topics.

algorithm, artificial intelligence, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

Add feedback

Mallows Models for Top-k Lists

Chierichetti, Flavio, Dasgupta, Anirban, Haddadan, Shahrzad, Kumar, Ravi, Lattanzi, Silvio

Neural Information Processing SystemsDec-31-2018

The classic Mallows model is a widely-used tool to realize distributions on per- mutations. Motivated by common practical situations, in this paper, we generalize Mallows to model distributions on top-k lists by using a suitable distance measure between top-k lists. Unlike many earlier works, our model is both analytically tractable and computationally efficient. We demonstrate this by studying two basic problems in this model, namely, sampling and reconstruction, from both algorithmic and experimental points of view.

artificial intelligence, machine learning, top-k list, (18 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.68)

Add feedback

Fair Clustering Through Fairlets

Chierichetti, Flavio, Kumar, Ravi, Lattanzi, Silvio, Vassilvitskii, Sergei

arXiv.org Machine LearningFeb-15-2018

We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the $k$-center and the $k$-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically quantify the value of fair clustering on real-world datasets with sensitive attributes.

artificial intelligence, decomposition, machine learning, (16 more...)

arXiv.org Machine Learning

1802.05733

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Law (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Fair Clustering Through Fairlets

Chierichetti, Flavio, Kumar, Ravi, Lattanzi, Silvio, Vassilvitskii, Sergei

Neural Information Processing SystemsDec-31-2017

We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the \emph{price of fairness} by quantifying the value of fair clustering on real-world datasets with sensitive attributes.

artificial intelligence, decomposition, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Law (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback