AITopics

Technology: Information Technology > Artificial Intelligence (0.96)

Neural Information Processing SystemsFeb-10-2026, 01:34:20 GMT

ba3e9b6a519cfddc560b5d53210df1bd-AuthorFeedback.pdf

We have 2 large datasets, HIGGS and Bosch (see reply to[R3]-1)). Table B highlights our differences.3) Motivation: We provide a strong attack as a tool for evaluating the9 robustnessoftreebasedmodels. MILP uses a thin wrapper around the Gurobi Solver.

artificial intelligence, experiment, machine learning, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Neural Information Processing SystemsAug-16-2025, 02:09:52 GMT

A and Model Statistics

covtype, perturbation, statistics, (15 more...)

Technology: Information Technology > Artificial Intelligence (0.96)

Neural Information Processing SystemsAug-16-2025, 02:09:34 GMT

In Table A, we repeat our experiments on 5000 test examples for each dataset (or the

We thank all reviewers for their valuable comments and suggestions. Table B highlights our differences. Methods on bottom-left corner are better. We will enlarge figures and explain more. In Table 2 and 3, HIGGS contains 10.5 million training examples and the ensemble We additionally added Bosch (1.2 million examples, 968 features) in Table A. Both datasets are from Our method is effective on both datasets.

dataset, experiment, test example, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)

Munteanu, Alexander, Omlor, Simon, Peters, Christian

$p$-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets

arXiv.org Machine LearningMar-25-2022

We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses. It extends the standard probit model by replacing its link function, the standard normal cdf, by a $p$-generalized normal distribution for $p\in[1, \infty)$. The $p$-generalized normal distributions \citep{Sub23} are of special interest in statistical modeling because they fit much more flexibly to data. Their tail behavior can be controlled by choice of the parameter $p$, which influences the model's sensitivity to outliers. Special cases include the Laplace, the Gaussian, and the uniform distributions. We further show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+\varepsilon)$ on large data by combining sketching techniques with importance subsampling to obtain a small data summary called coreset.

approximation ratio median approximation ratio, artificial intelligence, machine learning, (16 more...)

2203.13568

Country: Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Trofimov, Ilya, Genkin, Alexander

Distributed Coordinate Descent for Generalized Linear Models with Regularization

arXiv.org Machine LearningJun-26-2017

Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.

algorithm, artificial intelligence, machine learning, (18 more...)

1611.02101

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Machine LearningJul-12-2016

Nystrom Method for Approximating the GMM Kernel

Li, Ping

The GMM (generalized min-max) kernel was recently proposed (Li, 2016) as a measure of data similarity and was demonstrated effective in machine learning tasks. In order to use the GMM kernel for large-scale datasets, the prior work resorted to the (generalized) consistent weighted sampling (GCWS) to convert the GMM kernel to linear kernel. We call this approach as ``GMM-GCWS''. In the machine learning literature, there is a popular algorithm which we call ``RBF-RFF''. That is, one can use the ``random Fourier features'' (RFF) to convert the ``radial basis function'' (RBF) kernel to linear kernel. It was empirically shown in (Li, 2016) that RBF-RFF typically requires substantially more samples than GMM-GCWS in order to achieve comparable accuracies. The Nystrom method is a general tool for computing nonlinear kernels, which again converts nonlinear kernels into linear kernels. We apply the Nystrom method for approximating the GMM kernel, a strategy which we name as ``GMM-NYS''. In this study, our extensive experiments on a set of fairly large datasets confirm that GMM-NYS is also a strong competitor of RBF-RFF.

acc, artificial intelligence, machine learning, (17 more...)

1607.03475

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.89)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Li, Ping, Moore, Joshua, Konig, Christian

b-Bit Minwise Hashing for Large-Scale Linear SVM

arXiv.org Machine LearningMay-22-2011

In this paper, we propose to (seamlessly) integrate b-bit minwise hashing with linear SVM to substantially improve the training (and testing) efficiency using much smaller memory, with essentially no loss of accuracy. Theoretically, we prove that the resemblance matrix, the minwise hashing matrix, and the b-bit minwise hashing matrix are all positive definite matrices (kernels). Interestingly, our proof for the positive definiteness of the b-bit minwise hashing kernel naturally suggests a simple strategy to integrate b-bit hashing with linear SVM. Our technique is particularly useful when the data can not fit in memory, which is an increasingly critical issue in large-scale machine learning. Our preliminary experimental results on a publicly available webspam dataset (350K samples and 16 million dimensions) verified the effectiveness of our algorithm. For example, the training time was reduced to merely a few seconds. In addition, our technique can be easily extended to many other linear and nonlinear machine learning applications such as logistic regression.

artificial intelligence, machine learning, webspam, (16 more...)