graph-based learning
Analyzing the Harmonic Structure in Graph-Based Learning
We show that either explicitly or implicitly, various well-known graph-based models exhibit a common significant \emph{harmonic} structure in its target function -- the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze 5 popular models in graph-based learning: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis well explains several open questions of these models reported in the literature. Furthermore, it provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets support our analysis.
Learning with Partially Absorbing Random Walks
We propose a novel stochastic process that is with probability \alpha_i being absorbed at current state i, and with probability 1-\alpha_i follows a random edge out of it. We analyze its properties and show its potential for exploring graph structures. We prove that under proper absorption rates, a random walk starting from a set \mathcal{S} of low conductance will be mostly absorbed in \mathcal{S} . Moreover, the absorption probabilities vary slowly inside \mathcal{S}, while dropping sharply outside \mathcal{S}, thus implementing the desirable cluster assumption for graph-based learning. Remarkably, the partially absorbing process unifies many popular models arising in a variety of contexts, provides new insights into them, and makes it possible for transferring findings from one paradigm to another.
Learning with Partially Absorbing Random Walks
Wu, Xiao-ming, Li, Zhenguo, So, Anthony M., Wright, John, Chang, Shih-fu
We propose a novel stochastic process that is with probability $\alpha_i$ being absorbed at current state $i$, and with probability $1-\alpha_i$ follows a random edge out of it. We analyze its properties and show its potential for exploring graph structures. We prove that under proper absorption rates, a random walk starting from a set $\mathcal{S}$ of low conductance will be mostly absorbed in $\mathcal{S}$. Moreover, the absorption probabilities vary slowly inside $\mathcal{S}$, while dropping sharply outside $\mathcal{S}$, thus implementing the desirable cluster assumption for graph-based learning. Remarkably, the partially absorbing process unifies many popular models arising in a variety of contexts, provides new insights into them, and makes it possible for transferring findings from one paradigm to another.
Analyzing the Harmonic Structure in Graph-Based Learning
Wu, Xiao-Ming, Li, Zhenguo, Chang, Shih-Fu
We show that either explicitly or implicitly, various well-known graph-based models exhibit a common significant \emph{harmonic} structure in its target function -- the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze 5 popular models in graph-based learning: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis well explains several open questions of these models reported in the literature.
On Consistency of Graph-based Semi-supervised Learning
Graph-based semi-supervised learning is one of the most popular methods in machine learning. Some of its theoretical properties such as bounds for the generalization error and the convergence of the graph Laplacian regularizer have been studied in computer science and statistics literatures. However, a fundamental statistical property, the consistency of the estimator from this method has not been proved. In this article, we study the consistency problem under a non-parametric framework. We prove the consistency of graph-based learning in the case that the estimated scores are enforced to be equal to the observed responses for the labeled data. The sample sizes of both labeled and unlabeled data are allowed to grow in this result. When the estimated scores are not required to be equal to the observed responses, a tuning parameter is used to balance the loss function and the graph Laplacian regularizer. We give a counterexample demonstrating that the estimator for this case can be inconsistent. The theoretical findings are supported by numerical studies.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Graph-based Learning with Unbalanced Clusters
Qian, Jing, Saligrama, Venkatesh, Zhao, Manqi
Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $\epsilon$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Research Report (0.64)
- Workflow (0.48)