AITopics | xnull

Collaborating Authors

xnull

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DP_Stochastic_Optimization__New_Results_in_Convex_and_Non_convex_Settings-1.pdf

Neural Information Processing SystemsApr-25-2026, 19:58:36 GMT

artificial intelligence, log 2, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

f6185f0ef02dcaec414a3171cd01c697-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 03:40:20 GMT

nullh, nullr, triplet, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Germany (0.04)
Asia > China (0.04)

Genre: Personal > Honors (0.48)

Industry: Leisure & Entertainment (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)

Add feedback

Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts

Liao, Fangshuo, Kyrillidis, Anastasios

arXiv.org Artificial IntelligenceOct-9-2025

Mixture-of-Experts (MoE) architectures have emerged as a cornerstone of modern AI systems. In particular, MoEs route inputs dynamically to specialized experts whose outputs are aggregated through weighted summation. Despite their widespread application, theoretical understanding of MoE training dynamics remains limited to either separate expert-router optimization or only top-1 routing scenarios with carefully constructed datasets. This paper advances MoE theory by providing convergence guarantees for joint training of soft-routed MoE models with non-linear routers and experts in a student-teacher framework. We prove that, with moderate over-parameterization, the student network undergoes a feature learning phase, where the router's learning process is ``guided'' by the experts, that recovers the teacher's parameters. Moreover, we show that a post-training pruning can effectively eliminate redundant neurons, followed by a provably convergent fine-tuning process that reaches global optimality. To our knowledge, our analysis is the first to bring novel insights in understanding the optimization landscape of the MoE architecture.

artificial intelligence, machine learning, nullnull null, (15 more...)

arXiv.org Artificial Intelligence

2510.07205

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Improved Approximation of Sensor Network Performance for Seabed Acoustic Sensors

Kim, Mingyu, Stilwell, Daniel J., Yetkin, Harun, Jimenez, Jorge

arXiv.org Artificial IntelligenceMay-5-2025

Sensor locations to detect Poisson-distributed targets, such as seabed sensors that detect shipping traffic, can be selected to maximize the so-called void probability, which is the probability of detecting all targets. Because evaluation of void probability is computationally expensive, we propose a new approximation of void probability that can greatly reduce the computational cost of selecting locations for a network of sensors. We build upon prior work that approximates void probability using Jensen's inequality. Our new approach better accommodates uncertainty in the (Poisson) target model and yields a sharper error bound. The proposed method is evaluated using historical ship traffic data from the Hampton Roads Channel, Virginia, demonstrating a reduction in the approximation error compared to the previous approach. The results validate the effectiveness of the improved approximation for maritime surveillance applications.

approximation, artificial intelligence, void probability, (13 more...)

arXiv.org Artificial Intelligence

2505.00804

Country: North America > United States > Virginia (0.25)

Genre: Research Report (0.50)

Industry:

Government > Military (0.49)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Communications > Networks > Sensor Networks (0.52)

Add feedback

Tight Generalization Bounds for Large-Margin Halfspaces

Larsen, Kasper Green, Schalburg, Natascha

arXiv.org Artificial IntelligenceFeb-19-2025

We prove the first generalization bound for large-margin halfspaces that is asymptotically tight in the tradeoff between the margin, the fraction of training points with the given margin, the failure probability and the number of training points.

nullh, nullw, nullx, (16 more...)

arXiv.org Artificial Intelligence

2502.13692

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > Denmark (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Analysis of feature learning in weight-tied autoencoders via the mean field lens

Nguyen, Phan-Minh

arXiv.org Machine LearningFeb-16-2021

Autoencoders are among the earliest introduced nonlinear models for unsupervised learning. Although they are widely adopted beyond research, it has been a longstanding open problem to understand mathematically the feature extraction mechanism that trained nonlinear autoencoders provide. In this work, we make progress in this problem by analyzing a class of two-layer weight-tied nonlinear autoencoders in the mean field framework. Upon a suitable scaling, in the regime of a large number of neurons, the models trained with stochastic gradient descent are shown to admit a mean field limiting dynamics. This limiting description reveals an asymptotically precise picture of feature learning by these models: their training dynamics exhibit different phases that correspond to the learning of different principal subspaces of the data, with varying degrees of nonlinear shrinkage dependent on the $\ell_{2}$-regularization and stopping time. While we prove these results under an idealized assumption of (correlated) Gaussian data, experiments on real-life data demonstrate an interesting match with the theory. The autoencoder setup of interests poses a nontrivial mathematical challenge to proving these results. In this setup, the "Lipschitz" constants of the models grow with the data dimension $d$. Consequently an adaptation of previous analyses requires a number of neurons $N$ that is at least exponential in $d$. Our main technical contribution is a new argument which proves that the required $N$ is only polynomial in $d$. We conjecture that $N\gg d$ is sufficient and that $N$ is necessarily larger than a data-dependent intrinsic dimension, a behavior that is fundamentally different from previously studied setups.

autoencoder, nullnull, xnull, (14 more...)

arXiv.org Machine Learning

2102.08373

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems

Thaker, Parth, Dasarathy, Gautam, Nedić, Angelia

arXiv.org Machine LearningFeb-3-2020

We consider the problem of recovering a complex vector $\mathbf{x}\in \mathbb{C}^n$ from $m$ quadratic measurements $\{\langle A_i\mathbf{x}, \mathbf{x}\rangle\}_{i=1}^m$. This problem, known as quadratic feasibility, encompasses the well known phase retrieval problem and has applications in a wide range of important areas including power system state estimation and x-ray crystallography. In general, not only is the the quadratic feasibility problem NP-hard to solve, but it may in fact be unidentifiable. In this paper, we establish conditions under which this problem becomes {identifiable}, and further prove isometry properties in the case when the matrices $\{A_i\}_{i=1}^m$ are Hermitian matrices sampled from a complex Gaussian distribution. Moreover, we explore a nonconvex {optimization} formulation of this problem, and establish salient features of the associated optimization landscape that enables gradient algorithms with an arbitrary initialization to converge to a \emph{globally optimal} point with a high probability. Our results also reveal sample complexity requirements for successfully identifying a feasible solution in these contexts.

matrix, nullnull, xx yy null 2, (13 more...)

arXiv.org Machine Learning

2002.01066

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Arizona (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

Add feedback

Transport Gaussian Processes for Regression

Rios, Gonzalo

arXiv.org Machine LearningJan-30-2020

Gaussian process (GP) priors are non-parametric generative models with appealing modelling properties for Bayesian inference: they can model non-linear relationships through noisy observations, have closed-form expressions for training and inference, and are governed by interpretable hyperparameters. However, GP models rely on Gaussianity, an assumption that does not hold in several real-world scenarios, e.g., when observations are bounded or have extreme-value dependencies, a natural phenomenon in physics, finance and social sciences. Although beyond-Gaussian stochastic processes have caught the attention of the GP community, a principled definition and rigorous treatment is still lacking. In this regard, we propose a methodology to construct stochastic processes, which include GPs, warped GPs, Student-t processes and several others under a single unified approach. We also provide formulas and algorithms for training and inference of the proposed models in the regression problem. Our approach is inspired by layers-based models, where each proposed layer changes a specific property over the generated stochastic process. That, in turn, allows us to push-forward a standard Gaussian white noise prior towards other more expressive stochastic processes, for which marginals and copulas need not be Gaussian, while retaining the appealing properties of GPs. We validate the proposed model through experiments with real-world data.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2001.11473

Country:

South America > Chile (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Banking & Finance > Economy (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Daniely, Amit

arXiv.org Machine LearningNov-22-2019

Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with near optimal network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\mathbb{S}^{d-1}$.

arxiv preprint arxiv, kernel space, neural network, (13 more...)

arXiv.org Machine Learning

1911.09873

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Add feedback

Filters

Collaborating Authors

xnull

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DP_Stochastic_Optimization__New_Results_in_Convex_and_Non_convex_Settings-1.pdf

f6185f0ef02dcaec414a3171cd01c697-Supplemental.pdf

96fca94df72984fc97ee5095410d4dec-Supplemental.pdf

Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts

Improved Approximation of Sensor Network Performance for Seabed Acoustic Sensors

Tight Generalization Bounds for Large-Margin Halfspaces

Analysis of feature learning in weight-tied autoencoders via the mean field lens

On the Sample Complexity and Optimization Landscape for Quadratic Feasibility Problems

Transport Gaussian Processes for Regression

Neural Networks Learning and Memorization with (almost) no Over-Parameterization