AITopics

2410.0268

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Neural Information Processing SystemsMar-15-2024, 06:57:42 GMT

On U-processes and clustering performance

Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods.

partition, u-process, u-statistics, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Neural Information Processing SystemsMar-14-2024, 19:37:09 GMT

Communication-Efficient Algorithms for Statistical Optimization John C. Duchi

We study two communication-efficient algorithms for distributed statistical optimization on large-scale data. The first algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates.

algorithm, dataset, theorem 1, (13 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Schuler, Alejandro, Li, Yi, van der Laan, Mark

Lassoed Tree Boosting

arXiv.org Machine LearningDec-8-2023

Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stopping achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, sparsity, or smoothness. We use simulation and real data to confirm our theory and demonstrate empirical performance and scalability on par with standard boosting. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.

algorithm, artificial intelligence, machine learning, (16 more...)

2205.10697

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > Ohio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

arXiv.org Machine LearningFeb-25-2015

On aggregation for heavy-tailed classes

Mendelson, Shahar

We introduce an alternative to the notion of `fast rate' in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions -- for example, that the $L_q$ and $L_2$ norms are equivalent on the linear span of the class for some $q>2$, and the target random variable is square-integrable.

artificial intelligence, machine learning, probability, (17 more...)

1502.07097

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningOct-22-2014

Learning without Concentration

Mendelson, Shahar

We obtain sharp bounds on the performance of Empirical Risk Minimization performed in a convex class and with respect to the squared loss, without assuming that class members and the target are bounded functions or have rapidly decaying tails. Rather than resorting to a concentration-based argument, the method used here relies on a `small-ball' assumption and thus holds for classes consisting of heavy-tailed functions and for heavy-tailed targets. The resulting estimates scale correctly with the `noise level' of the problem, and when applied to the classical, bounded scenario, always improve the known bounds.

artificial intelligence, machine learning, probability, (17 more...)

1401.0304

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Zhang, Yuchen, Wainwright, Martin J., Duchi, John C.

Communication-Efficient Algorithms for Statistical Optimization

Neural Information Processing SystemsDec-31-2012

We study two communication-efficient algorithms for distributed statistical optimization on large-scale data. The first algorithm is an averaging method that distributes the $N$ data samples evenly to $m$ machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as $\order(N^{-1}+(N/m)^{-2})$. Whenever $m \le \sqrt{N}$, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all $N$ samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has mean-squared error that decays as $\order(N^{-1}+(N/m)^{-3})$, and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on large-scale problems from the Microsoft Learning to Rank dataset.

algorithm, artificial intelligence, machine learning, (13 more...)

Country: North America > United States > California (0.28)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-31-2011

On U-processes and clustering performance

Clémençcon, Stéphan J.

Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk is proved to be of the order O(1/\sqrt{n}). Based on recent results related to the tail behavior of degenerate U-processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered.

artificial intelligence, machine learning, u-process, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kakade, Sham, Kalai, Adam Tauman

From Batch to Transductive Online Learning

Neural Information Processing SystemsDec-31-2006

It is well-known that everything that is learnable in the difficult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite direction. We give an efficient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efficiently learnable in a batch model and a transductive online model.

algorithm, batch, empirical minimizer, (15 more...)

Country:

North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Industry: Education > Educational Setting > Online (0.42)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.30)

Kakade, Sham, Kalai, Adam Tauman

From Batch to Transductive Online Learning

Neural Information Processing SystemsDec-31-2006

It is well-known that everything that is learnable in the difficult online setting, where an arbitrary sequences of examples must be labeled one at a time, is also learnable in the batch setting, where examples are drawn independently from a distribution. We show a result in the opposite direction. We give an efficient conversion algorithm from batch to online that is transductive: it uses future unlabeled data. This demonstrates the equivalence between what is properly and efficiently learnable in a batch model and a transductive online model.

algorithm, batch, empirical minimizer, (15 more...)