AITopics

2007.08584

Genre: Research Report (0.64)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningAug-5-2020

A No-Free-Lunch Theorem for MultiTask Learning

Hanneke, Steve, Kpotufe, Samory

Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from $N$ related distributions $\{P_t\}$ are to be combined towards improving performance on any single such distribution ${\cal D}$. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from multiple tasks, the vast majority of analyses result in bounds that improve at best in the number $n$ of samples per task, but most often do not improve in $N$. As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable. In particular, we consider a seemingly favorable classification scenario where all tasks $P_t$ share a common optimal classifier $h^*,$ and which can be shown to admit a broad range of regimes with improved oracle rates in terms of $N$ and $n$. Some of our main results are as follows: $\bullet$ We show that, even though such regimes admit minimax rates accounting for both $n$ and $N$, no adaptive algorithm exists; that is, without access to distributional information, no algorithm can guarantee rates that improve with large $N$ for $n$ fixed. $\bullet$ With a bit of additional information, namely, a ranking of tasks $\{P_t\}$ according to their distance to a target ${\cal D}$, a simple rank-based procedure can achieve near optimal aggregations of tasks' datasets, despite a search space exponential in $N$. Interestingly, the optimal aggregation might exclude certain tasks, even though they all share the same $h^*$.

artificial intelligence, machine learning, probability, (16 more...)

2006.15785

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.55)

Neural Information Processing SystemsFeb-15-2020, 02:27:12 GMT

Fast, smooth and adaptive regression in metric spaces

Kpotufe, Samory

It was recently shown that certain nonparametric regressors can escape the curse of dimensionality in the sense that their convergence rates adapt to the intrinsic dimension of data (\cite{BL:65, SK:77}). We prove some stronger results in more general settings. In particular, we consider a regressor which, by combining aspects of both tree-based regression and kernel regression, operates on a general metric space, yields a smooth function, and evaluates in time $O(\log n)$. We derive a tight convergence rate of the form $n {-2/(2 d)}$ where $d$ is the Assouad dimension of the input space. Papers published at the Neural Information Processing Systems Conference.

adaptive regression, artificial intelligence, machine learning, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.68)

Neural Information Processing SystemsFeb-14-2020, 17:42:22 GMT

Regression-tree Tuning in a Streaming Setting

Kpotufe, Samory, Orabona, Francesco

We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. We prove that it is possible to maintain such a structure in time $O(\log n)$ at any time step $n$ while achieving a nearly-optimal regression rate of $\tilde{O}(n {-2/(2 d)})$ in terms of the unknown metric dimension $d$. Finally we prove a new regression lower-bound which is independent of a given data size, and hence is more appropriate for the streaming setting. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, decision tree learning, machine learning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Neural Information Processing SystemsFeb-14-2020, 10:14:56 GMT

Optimal rates for k-NN density and mode estimation

Dasgupta, Sanjoy, Kpotufe, Samory

We present two related contributions of independent interest: (1) high-probability finite sample rates for $k$-NN density estimation, and (2) practical mode estimators -- based on $k$-NN -- which attain minimax-optimal rates under surprisingly general distributional conditions. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, k-nn density and mode estimation, machine learning, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

arXiv.org Machine LearningAug-15-2019

Kernel Sketching yields Kernel JL

Kpotufe, Samory, Sriperumbudur, Bharath K.

The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix $\mathbf{K}$ yields an operator whose counterpart in an RKHS $\mathcal{H}$, is a random projection operator---in the spirit of Johnson-Lindenstrauss (JL) lemma. To be precise, given a random matrix $Z$ with i.i.d. Gaussian entries, we show that a sketch $Z\mathbf{K}$ corresponds to a particular random operator in (infinite-dimensional) Hilbert space $\mathcal{H}$ that maps functions $f \in \mathcal{H}$ to a low-dimensional space $\mathbb{R}^d$, while preserving a weighted RKHS inner-product of the form $\langle f, g \rangle_{\Sigma} \doteq \langle f, \Sigma^3 g \rangle_{\mathcal{H}}$, where $\Sigma$ is the \emph{covariance} operator induced by the data distribution. In particular, under similar assumptions as in kernel PCA (KPCA), or kernel $k$-means (K-$k$-means), well-separated subsets of feature-space $\{K(\cdot, x): x \in \cal X\}$ remain well-separated after such operation, which suggests similar benefits as in KPCA and/or K-$k$-means, albeit at the much cheaper cost of a random projection. In particular, our convergence rates suggest that, given a large dataset $\{X_i\}_{i=1}^N$ of size $N$, we can build the Gram matrix $\mathbf{K}$ on a much smaller subsample of size $n\ll N$, so that the sketch $Z\mathbf{K}$ is very cheap to obtain and subsequently apply as a projection operator on the original data $\{X_i\}_{i=1}^N$. We verify these insights empirically on synthetic data, and on real-world clustering applications.

artificial intelligence, machine learning, null 2, (18 more...)

1908.05818

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Neural Information Processing SystemsDec-31-2018

PAC-Bayes Tree: weighted subtrees with guarantees

Nguyen, Tin D., Kpotufe, Samory

We present a weighted-majority classification approach over subtrees of a fixed tree, which provably achieves excess-risk of the same order as the best tree-pruning. Furthermore, the computational efficiency of pruning is maintained at both training and testing time despite having to aggregate over an exponential number of subtrees. We believe this is the first subtree aggregation approach with such guarantees. The guarantees are obtained via a simple combination of insights from PAC-Bayes theory, which we believe should be of independent interest, as it generically implies consistency for weighted-voting classifiers w.r.t. Bayes - while, in contrast, usual PAC-bayes approaches only establish consistency of Gibbs classifiers.

artificial intelligence, machine learning, subtree, (17 more...)

Country:

Asia > Japan (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-31-2018

PAC-Bayes Tree: weighted subtrees with guarantees

Nguyen, Tin D., Kpotufe, Samory

artificial intelligence, machine learning, subtree, (17 more...)

Country:

Asia > Japan (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMay-21-2018

Quickshift++: Provably Good Initializations for Sample-Based Mean Shift

Jiang, Heinrich, Jang, Jennifer, Kpotufe, Samory

We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data. Such seedings act as more stable and expressive cluster-cores than the singleton modes found by Quick Shift. We establish statistical consistency guarantees for this modification. We then show strong clustering performance on real datasets as well as promising applications to image segmentation.

artificial intelligence, machine learning, quickshift, (16 more...)

1805.07909

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

arXiv.org Machine LearningMar-5-2018

Marginal Singularity, and the Benefits of Labels in Covariate-Shift

Kpotufe, Samory, Martinet, Guillaume

We present new minimax results that concisely capture the relative benefits of source and target labeled data, under covariate-shift. Namely, we show that the benefits of target labels are controlled by a transfer-exponent $\gamma$ that encodes how singular Q is locally w.r.t. P, and interestingly allows situations where transfer did not seem possible under previous insights. In fact, our new minimax analysis - in terms of $\gamma$ - reveals a continuum of regimes ranging from situations where target labels have little benefit, to regimes where target labels dramatically improve classification. We then show that a recently proposed semi-supervised procedure can be extended to adapt to unknown $\gamma$, and therefore requests labels only when beneficial, while achieving minimax transfer rates.

artificial intelligence, inequality, machine learning, (19 more...)

1803.01833

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)