Goto

Collaborating Authors

 random tree




Active Learning for Non-Parametric Regression Using Purely Random Trees

Neural Information Processing Systems

Active learning is the task of using labelled data to select additional points to label, with the goal of fitting the most accurate model with a fixed budget of labelled points. In binary classification active learning is known to produce faster rates than passive learning for a broad range of settings. However in regression restrictive structure and tailored methods were previously needed to obtain theoretically superior performance. In this paper we propose an intuitive tree based active learning algorithm for non-parametric regression with provable improvement over random sampling. When implemented with Mondrian Trees our algorithm is tuning parameter free, consistent and minimax optimal for Lipschitz functions.




Reviews: Active Learning for Non-Parametric Regression Using Purely Random Trees

Neural Information Processing Systems

Yet, I still think that their results (including those appearing in their response) are not showing significant gains as seen in other Active Learning settings. The paper addresses the problem of active learning for regression with random trees. The authors introduce a simple'oracle' algorithm and a risk variance is provided for this criterion. Authors provide a risk minimizing query criteria and claim it is better than the random selection criterion An estimation algorithm is presented that provides the necessary statistics at its first stage to allow an approximation to the'oracle' algorithm. Numerical results are presented in which improved performance over random and uncertainty sample is provided for simulated data, and no improvement for real UCI data.


The graph alignment problem: fundamental limits and efficient algorithms

Ganassali, Luca

arXiv.org Machine Learning

Similarly to many other inference problems in planted models, we are interested in understanding the fundamental information-theoretical limits as well as the computational hardness of graph alignment. First, we study the Gaussian setting, when the graphs are complete and the signal lies on correlated Gaussian edges weights. We prove that the exact recovery task exhibits a sharp information-theoretic threshold (and characterize it), and study a simple and natural spectral method for recovery, EIG1, which consists in aligning the leading eigenvectors of the adjacency matrices of the two graphs. While most of the recent work on the subject was dedicated to recovering the hidden signal in dense graphs, we next explore graph alignment in the sparse regime, where the mean degree of the nodes are constant, not scaling with the graph size. In this particularly challenging setting, for sparse Erdős-Rényi graphs, only a fraction of the nodes can be correctly matched by any algorithm.


Strahler Number of Natural Language Sentences in Comparison with Random Trees

Tanaka-Ishii, Kumiko, Tanaka, Akira

arXiv.org Artificial Intelligence

The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost 3 or 4, similarly to the case of river bifurcation as reported by Strahler (1957). From the theory behind the number, we show that it is one kind of lower limit on the amount of memory required to process sentences. We consider the Strahler number to provide reasoning that explains reports showing that the number of required memory areas to process sentences is 3 to 4 for parsing (Schuler et al., 2010), and reports indicating a psychological "magical number" of 3 to 5 (Cowan, 2001). An analytical and empirical analysis shows that the Strahler number is not constant but grows logarithmically; therefore, the Strahler number of sentences derives from the range of sentence lengths. Furthermore, the Strahler number is not different for random trees, which could suggest that its origin is not specific to natural language.


Efficient inspection of underground galleries using k robots with limited energy

Bereg, Sergey, Caraballo, L. Evaristo, Díaz-Báñez, José Miguel

arXiv.org Artificial Intelligence

We study the problem of optimally inspecting an underground (underwater) gallery with k agents. We consider a gallery with a single opening and with a tree topology rooted at the opening. Due to the small diameter of the pipes (caves), the agents are small robots with limited autonomy and there is a supply station at the gallery's opening. Therefore, they are initially placed at the root and periodically need to return to the supply station. Our goal is to design off-line strategies to efficiently cover the tree with $k$ small robots. We consider two objective functions: the covering time (maximum collective time) and the covering distance (total traveled distance). The maximum collective time is the maximum time spent by a robot needs to finish its assigned task (assuming that all the robots start at the same time); the total traveled distance is the sum of the lengths of all the covering walks. Since the problems are intractable for big trees, we propose approximation algorithms. Both efficiency and accuracy of the suboptimal solutions are empirically showed for random trees through intensive numerical experiments.


Active Learning for Non-Parametric Regression Using Purely Random Trees

Goetz, Jack, Tewari, Ambuj, Zimmerman, Paul

Neural Information Processing Systems

Active learning is the task of using labelled data to select additional points to label, with the goal of fitting the most accurate model with a fixed budget of labelled points. In binary classification active learning is known to produce faster rates than passive learning for a broad range of settings. However in regression restrictive structure and tailored methods were previously needed to obtain theoretically superior performance. In this paper we propose an intuitive tree based active learning algorithm for non-parametric regression with provable improvement over random sampling. When implemented with Mondrian Trees our algorithm is tuning parameter free, consistent and minimax optimal for Lipschitz functions.