l-ensemble
Reviews: Maximizing Induced Cardinality Under a Determinantal Point Process
In that framework and once a DPP has been learned, the authors remark that finding the sample with maximum DPP likelihood (the so-called "MAP recommendation") does not lead to meaningful recommendations. The contributions are as follows. The authors introduce another utility function, the maximum induced cardinality (MIC), and explain how to approximately optimize it using submodular approximations. Algorithms to optimize the MIC criterion are compared on synthetic datasets.
Graph Convolutional Neural Networks with Diverse Negative Samples via Decomposed Determinant Point Processes
Duan, Wei, Xuan, Junyu, Qiao, Maoying, Lu, Jie
Graph convolutional networks (GCNs) have achieved great success in graph representation learning by extracting high-level features from nodes and their topology. Since GCNs generally follow a message-passing mechanism, each node aggregates information from its first-order neighbour to update its representation. As a result, the representations of nodes with edges between them should be positively correlated and thus can be considered positive samples. However, there are more non-neighbour nodes in the whole graph, which provide diverse and useful information for the representation update. Two non-adjacent nodes usually have different representations, which can be seen as negative samples. Besides the node representations, the structural information of the graph is also crucial for learning. In this paper, we used quality-diversity decomposition in determinant point processes (DPP) to obtain diverse negative samples. When defining a distribution on diverse subsets of all non-neighbouring nodes, we incorporate both graph structure information and node representations. Since the DPP sampling process requires matrix eigenvalue decomposition, we propose a new shortest-path-base method to improve computational efficiency. Finally, we incorporate the obtained negative samples into the graph convolution operation. The ideas are evaluated empirically in experiments on node classification tasks. These experiments show that the newly proposed methods not only improve the overall performance of standard representation learning but also significantly alleviate over-smoothing problems.
- North America > United States > Nevada (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (14 more...)
- Instructional Material (0.67)
- Research Report (0.64)
A Faster Sampler for Discrete Determinantal Point Processes
Barthelmé, Simon, Tremblay, Nicolas, Amblard, Pierre-Olivier
Discrete Determinantal Point Processes (DPPs) have a wide array of potential applications for subsampling datasets. They are however held back in some cases by the high cost of sampling. In the worst-case scenario, the sampling cost scales as O(n^3) where n is the number of elements of the ground set. A popular workaround to this prohibitive cost is to sample DPPs defined by low-rank kernels. In such cases, the cost of standard sampling algorithms scales as O(np^2 + nm^2) where m is the (average) number of samples of the DPP (usually m << n) and p the rank of the kernel used to define the DPP (m \leq p \leq n). The first term, O(np^2), comes from a SVD-like step. We focus here on the second term of this cost, O(nm^2), and show that it can be brought down to O(nm + m^3 log m) without loss on the sampling's exactness. In practice, we observe very substantial speedups compared to the classical algorithm as soon as n > 1000. The algorithm described here is a close variant of the standard algorithm for sampling continuous DPPs, and uses rejection sampling. In the specific case of projection DPPs, we also show that any additional sample can be drawn in time O(m^3 log m). Finally, an interesting by-product of the analysis is that a realisation from a DPP is typically contained in a subset of size O(m log m) formed using leverage score i.i.d. sampling.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > Pennsylvania (0.04)
Determinantal Point Processes Implicitly Regularize Semi-parametric Regression Problems
Fanuel, Michaël, Schreurs, Joachim, Suykens, Johan A. K.
Semi-parametric regression models are used in several applications which require comprehensibility without sacrificing accuracy. Examples are spline interpolation in geophysics, or non-linear time series problems, where the system includes for instance a linear and non-linear component. We discuss here the use of a finite Determinantal Point Process (DPP) sampling for approximating semi-parametric models in two cases. On the one hand, in the case of large training data sets, DPP sampling is used to reduce the number of model parameters. On the other hand, DPPs can determine experimental designs in the case of the optimal interpolation models. Recently, Barthelm\'e, Tremblay, Usevich, and Amblard introduced a novel representation of finite DPP's. They formulated extended $L$-ensembles that can conveniently represent for instance partial-projection DPPs and suggest their use for optimal interpolation. With the help of this formalism, we derive a key identity illustrating the implicit regularization effect of determinantal sampling for semi-parametric regression and interpolation. Also, a novel projected Nystr\"om approximation is defined and used to derive a bound on the expected risk for the corresponding approximation of semi-parametric regression. This work naturally extends similar results obtained for kernel ridge regression.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
On the Relationship Between Probabilistic Circuits and Determinantal Point Processes
Zhang, Honghua, Holtzen, Steven, Broeck, Guy Van den
Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these tractable models.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes
Barthelmé, Simon, Amblard, Pierre-Olivier, Tremblay, Nicolas
Determinantal Point Processes (DPPs) are popular models for point processes with repulsion. They appear in numerous contexts, from physics to graph theory, and display appealing theoretical properties. On the more practical side of things, since DPPs tend to select sets of points that are some distance apart (repulsion), they have been advocated as a way of producing random subsets with high diversity. DPPs come in two variants: fixed-size and varying-size. A sample from a varying-size DPP is a subset of random cardinality, while in fixed-size "$k$-DPPs" the cardinality is fixed. The latter makes more sense in many applications, but unfortunately their computational properties are less attractive, since, among other things, inclusion probabilities are harder to compute. In this work we show that as the size of the ground set grows, $k$-DPPs and DPPs become equivalent, meaning that their inclusion probabilities converge. As a by-product, we obtain saddlepoint formulas for inclusion probabilities in $k$-DPPs. These turn out to be extremely accurate, and suffer less from numerical difficulties than exact methods do. Our results also suggest that $k$-DPPs and DPPs also have equivalent maximum likelihood estimators. Finally, we obtain results on asymptotic approximations of elementary symmetric polynomials which may be of independent interest.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > New York (0.04)
- North America > Saint Martin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Optimized Algorithms to Sample Determinantal Point Processes
Tremblay, Nicolas, Barthelme, Simon, Amblard, Pierre-Olivier
In this technical report, we discuss several sampling algorithms for Determinantal Point Processes (DPP). DPPs have recently gained a broad interest in the machine learning and statistics literature as random point processes with negative correlation, i.e., ones that can generate a "diverse" sample from a set of items. They are parametrized by a matrix $\mathbf{L}$, called $L$-ensemble, that encodes the correlations between items. The standard sampling algorithm is separated in three phases: 1/~eigendecomposition of $\mathbf{L}$, 2/~an eigenvector sampling phase where $\mathbf{L}$'s eigenvectors are sampled independently via a Bernoulli variable parametrized by their associated eigenvalue, 3/~a Gram-Schmidt-type orthogonalisation procedure of the sampled eigenvectors. In a naive implementation, the computational cost of the third step is on average $\mathcal{O}(N\mu^3)$ where $\mu$ is the average number of samples of the DPP. We give an algorithm which runs in $\mathcal{O}(N\mu^2)$ and is extremely simple to implement. If memory is a constraint, we also describe a dual variant with reduced memory costs. In addition, we discuss implementation details often missing in the literature.
Graphical structure of conditional independencies in determinantal point processes
Determinantal point process have recently been used as models in machine learning and this has raised questions regarding the characterizations of conditional independence. In this paper we investigate characterizations of conditional independence. We describe some conditional independencies through the conditions on the kernel of a determinantal point process, and show many can be obtained using the graph induced by a kernel of the L-ensemble. In recent years there have been several machine learning papers about the applications of determinantal point processes (DPP's) [4], [7], [8], [9]... An overview of theory, recent applications and problems in learning DPP's is given in a recent extensive survey [6] by Kulesza and Taskar. In a private communication with Ben Taskar, one of the questions from survey [6] (see §7.3), that remains for future research, was brought to author's attention: - Is there a simple characterization of the conditional independence relations encoded by a DPP? This question arises naturally having in mind conditional independence structure models (see [12]), such as graphical models (see [11]) that are often used. It turns out that, from the mathematical view point, elegant characterizations, similar to those in graphical models, exist. This paper provides two (main) characterizations: - the block in a Schur complement of the kernel has to be a 0-block (Theorem 16, Proposition 17); - we can use the structure of the graph induced by the kernel of the L-ensemble to read many conditional independencies in the process (Theorem 28, Proposition 30).
- Europe > Croatia > Zagreb County > Zagreb (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Becoming More Robust to Label Noise with Classifier Diversity
Smith, Michael R., Martinez, Tony
It is widely known in the machine learning community that class noise can be (and often is) detrimental to inducing a model of the data. Many current approaches use a single, often biased, measurement to determine if an instance is noisy. A biased measure may work well on certain data sets, but it can also be less effective on a broader set of data sets. In this paper, we present noise identification using classifier diversity (NICD) -- a method for deriving a less biased noise measurement and integrating it into the learning process. To lessen the bias of the noise measure, NICD selects a diverse set of classifiers (based on their predictions of novel instances) to determine which instances are noisy. We examine NICD as a technique for filtering, instance weighting, and selecting the base classifiers of a voting ensemble. We compare NICD with several other noise handling techniques that do not consider classifier diversity on a set of 54 data sets and 5 learning algorithms. NICD significantly increases the classification accuracy over the other considered approaches and is effective across a broad set of data sets and learning algorithms.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Utah > Utah County > Provo (0.04)
- North America > United States > New York (0.04)
- (2 more...)