AITopics | Mathematical & Statistical Methods

Reviews: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing SystemsJan-24-2025, 04:13:33 GMT

This is obviously intended to be fleshed-out in Section 2, but even there, the differences between the proposal and the references are not explicit. For example, I'm not sure how this paper differs from prior generalized-self-concordant work (e.g.

artificial intelligence, ill-conditioned generalized self-concordant loss, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.41)

Add feedback

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing SystemsJan-24-2025, 04:13:31 GMT

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)

Add feedback

Reviews: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing SystemsJan-24-2025, 04:13:22 GMT

The paper studies large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, in particular in ill-conditioned settings, providing new optimal generalization bounds and proofs of convergence. The reviewers found the contributions of high quality and were satisfied with the clarifications provided by the author response.

artificial intelligence, globally convergent newton method, ill-conditioned generalized self-concordant loss

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.78)

Add feedback

Off-Policy Evaluation via the Regularized Lagrangian Mengjiao Yang 1 Lihong Li

Neural Information Processing SystemsJan-24-2025, 01:35:54 GMT

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing SystemsJan-24-2025, 00:51:32 GMT

Update: I read authors' responce RE:sampling rate does not tell the whole story - i was suggesting to add information about on average how many instances were used for each of the splits (because it is not equal to sampling rate * total dataset size). I am keeping my accept rating, hoping that authors do make the changes to improve the derivations/clarity in the final submission Summary: this paper is concerned with a common trick that a lot of GBDT implementation apply - subsampling instances in order to speed up calculations for finding the best split. The authors formulate the problem of choosing the instances to sample as an optimization problem and derive a modified sampling scheme that is aimed at mimicking the gain that would be assigned to a split on all the of the data by using a gain calculated only on a subsampled instances. The experiments demonstrate good results. The paper is well written and easy to follow, apart from a couple of places in derivations(see my questions).

artificial intelligence, minimal variance sampling, quantile, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Reviews: Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing SystemsJan-24-2025, 00:31:05 GMT

The authors propose a non-uniform sampling strategy for stochastic gradient boosted decision trees. In particular, sampling probability of the training data is optimized towards maximizing the estimation accuracy of the splitting score of decision trees. The optimization problem allows an approximate closed-form solution. Experiment results demonstrate superior performance of the proposed strategy. The reviewers agree that the paper can not only help understand sampling within GBDT from a more rigorous perspective but also improve GBDT implementations in practice.

artificial intelligence, machine learning, minimal variance sampling, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

On the Universality of Graph Neural Networks on Large Random Graphs

Neural Information Processing SystemsJan-23-2025, 19:53:40 GMT

We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain "continuous" models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, referred to as Structural GNNs here (SGNNs). We study the convergence of SGNNs to their continuous counterpart (c-SGNNs) in the large random graph limit, under new conditions on the node identifiers. We then show that c-SGNNs are strictly more powerful than c-GNNs in the continuous limit, and prove their universality on several random graph models of interest, including most SBMs and a large class of random geometric graphs. Our results cover both permutation-invariant and permutation-equivariant architectures.

artificial intelligence, graph, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > France (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Universality of Graph Neural Networks on Large Random Graphs

Neural Information Processing SystemsJan-23-2025, 19:53:36 GMT

We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain "continuous" models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, referred to as Structural GNNs here (SGNNs). We study the convergence of SGNNs to their continuous counterpart (c-SGNNs) in the large random graph limit, under new conditions on the node identifiers. We then show that c-SGNNs are strictly more powerful than c-GNNs in the continuous limit, and prove their universality on several random graph models of interest, including most SBMs and a large class of random geometric graphs. Our results cover both permutation-invariant and permutation-equivariant architectures.

artificial intelligence, graph, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > France (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Quasi-Newton Methods for Saddle Point Problems Luo

Neural Information Processing SystemsJan-23-2025, 19:29:21 GMT

The design and analysis of proposed algorithm are based on estimating the square of indefinite Hessian matrix, which is different from classical quasi-Newton methods in convex optimization.

artificial intelligence, machine learning, quasi-newton method, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.62)

Add feedback

Quasi-Newton Methods for Saddle Point Problems Luo

Neural Information Processing SystemsJan-23-2025, 19:29:18 GMT

The design and analysis of proposed algorithm are based on estimating the square of indefinite Hessian matrix, which is different from classical quasi-Newton methods in convex optimization.

artificial intelligence, machine learning, quasi-newton method, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology: