sr measure
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Polynomial time algorithms for dual volume sampling
Chengtao Li, Stefanie Jegelka, Suvrit Sra
We study dual volume sampling, a method for selecting k columns from an n m short and wide matrix (n apple k apple m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method was proposed by Avron and Boutsidis (2013), who showed it to be a promising method for column subset selection and its multiple applications. However, its wider adoption has been hampered by the lack of polynomial time sampling algorithms. We remove this hindrance by developing an exact (randomized) polynomial time sampling algorithm as well as its derandomization. Thereafter, we study dual volume sampling via the theory of real stable polynomials and prove that its distribution satisfies the "Strong Rayleigh" property. This result has numerous consequences, including a provably fast-mixing Markov chain sampler that makes dual volume sampling much more attractive to practitioners. This sampler is closely related to classical algorithms for popular experimental design methods that are to date lacking theoretical analysis but are known to empirically work well.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Sen
Semantic relatedness (SR) measures form the algorithmic foundation of intelligent technologies in domains ranging from artificial intelligence to human-computer interaction. Although SR has been researched for decades, this work has focused on developing general SR measures rooted in graph and text mining algorithms that perform reasonably well for many different types of concepts. This paper introduces domain-specific SR, which augments general SR by identifying, capturing, and synthesizing domain-specific relationships between concepts. Using the domain of geography as a case study, we show that domain-specific SR -- and even geography-specific signals alone (e.g.
Exponentiated Strongly Rayleigh Distributions
Mariet, Zelda E., Sra, Suvrit, Jegelka, Stefanie
Strongly Rayleigh (SR) measures are discrete probability distributions over the subsets of a ground set. They enjoy strong negative dependence properties, as a result of which they assign higher probability to subsets of diverse elements. We introduce in this paper Exponentiated Strongly Rayleigh (ESR) measures, which sharpen (or smoothen) the negative dependence property of SR measures via a single parameter (the exponent) that can intuitively understood as an inverse temperature. We develop efficient MCMC procedures for approximate sampling from ESRs, and obtain explicit mixing time bounds for two concrete instances: exponentiated versions of Determinantal Point Processes and Dual Volume Sampling. We illustrate some of the potential of ESRs, by applying them to a few machine learning tasks; empirical results confirm that beyond their theoretical appeal, ESR-based models hold significant promise for these tasks.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Wisconsin (0.04)
- (4 more...)
Polynomial time algorithms for dual volume sampling
Li, Chengtao, Jegelka, Stefanie, Sra, Suvrit
We study dual volume sampling, a method for selecting k columns from an n*m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method was proposed by Avron and Boutsidis (2013), who showed it to be a promising method for column subset selection and its multiple applications. However, its wider adoption has been hampered by the lack of polynomial time sampling algorithms. We remove this hindrance by developing an exact (randomized) polynomial time sampling algorithm as well as its derandomization. Thereafter, we study dual volume sampling via the theory of real stable polynomials and prove that its distribution satisfies the “Strong Rayleigh” property. This result has numerous consequences, including a provably fast-mixing Markov chain sampler that makes dual volume sampling much more attractive to practitioners. This sampler is closely related to classical algorithms for popular experimental design methods that are to date lacking theoretical analysis but are known to empirically work well.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Polynomial Time Algorithms for Dual Volume Sampling
Li, Chengtao, Jegelka, Stefanie, Sra, Suvrit
We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method was proposed by Avron and Boutsidis (2013), who showed it to be a promising method for column subset selection and its multiple applications. However, its wider adoption has been hampered by the lack of polynomial time sampling algorithms. We remove this hindrance by developing an exact (randomized) polynomial time sampling algorithm as well as its derandomization. Thereafter, we study dual volume sampling via the theory of real stable polynomials and prove that its distribution satisfies the "Strong Rayleigh" property. This result has numerous consequences, including a provably fast-mixing Markov chain sampler that makes dual volume sampling much more attractive to practitioners. This sampler is closely related to classical algorithms for popular experimental design methods that are to date lacking theoretical analysis but are known to empirically work well.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
- Asia > Middle East > Jordan (0.04)
Towards Domain-Specific Semantic Relatedness: A Case Study from Geography
Sen, Shilad (Macalester College) | Johnson, Isaac (University of Minnesota) | Harper, Rebecca (Wilamette College) | Mai, Huy ( Brandeis University ) | Olsen, Samuel Horlbeck (Macalester College) | Mathers, Benjamin (Macalester College) | Vonessen, Laura Souza (University of Arizona) | Wright, Matthew (University of Minnesota) | Hecht, Brent (University of Minnesota)
Semantic relatedness (SR) measures form the algorithmic foundation of intelligent technologies in domains ranging from artificial intelligence to human-computer interaction. Although SR has been researched for decades, this work has focused on developing general SR measures rooted in graph and text mining algorithms that perform reasonably well for many different types of concepts. This paper introduces domain-specific SR, which augments general SR by identifying, capturing, and synthesizing domain-specific relationships between concepts. Using the domain of geography as a case study, we show that domain-specific SR — and even geography-specific signals alone (e.g. distance, containment) without sophisticated graph or text mining algorithms — significantly outperform the SR state-of-the-art for geographic concepts. In addition to substantially improving SR measures for geospatial technologies, an area that is rapidly increasing in importance, this work also unlocks an important new direction for SR research: SR measures that incorporate domain-specific customizations to increase accuracy.
- North America > United States > Minnesota (0.05)
- Asia > India (0.05)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- (19 more...)
- Media (0.46)
- Leisure & Entertainment (0.46)
- Health & Medicine (0.46)