AITopics | equation follow

Collaborating Authors

equation follow

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Batch Normalization for Neural Networks on Complex Domains

Nguyen, Xuan Son, Grozavu, Nistor

arXiv.org Machine LearningMay-4-2026

Riemannian neural networks have proven effective in solving a variety of machine learning tasks. The key to their success lies in the development of principled Riemannian analogs of fundamental building blocks in deep neural networks (DNNs). Among those, Riemannian batch normalization (BN) layers have shown to enhance training stability and improve accuracy. In this paper, we propose BN layers for neural networks on complex domains. The proposed layers have close connections with existing Riemannian BN layers. We derive essential components for practical implementations of BN layers on some complex domains which are less studied in previous works, e.g., the Siegel disk domain. We conduct experiments on radar clutter classification, node classification, and action recognition demonstrating the efficacy of our method.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

2605.00467

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

9afe487de556e59e6db6c862adfe25a4-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 12:36:02 GMT

inequality, lemma, lemma 5, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

0d00a699f60e642b310eb04b76cc7731-Supplemental-Conference.pdf

Neural Information Processing SystemsNov-13-2025, 12:41:12 GMT

ak 2, artificial intelligence, lemma 3, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/ε)$ to Nearly $ε$-Free

Huang, Xunpeng, Lin, Yingyu, Jain, Nishant, Wang, Kaibo, Zou, Difan, Ma, Yian, Zhang, Tong

arXiv.org Artificial IntelligenceSep-29-2025

We study masked discrete diffusion -- a flexible paradigm for text generation in which tokens are progressively corrupted by special mask symbols before being denoised. Although this approach has demonstrated strong empirical performance, its theoretical complexity in high-dimensional settings remains insufficiently understood. Existing analyses largely focus on uniform discrete diffusion, and more recent attempts addressing masked diffusion either (1) overlook widely used Euler samplers, (2) impose restrictive bounded-score assumptions, or (3) fail to showcase the advantages of masked discrete diffusion over its uniform counterpart. To address this gap, we show that Euler samplers can achieve $ε$-accuracy in total variation (TV) with $\tilde{O}(d^{2}ε^{-3/2})$ discrete score evaluations, thereby providing the first rigorous analysis of typical Euler sampler in masked discrete diffusion. We then propose a Mask-Aware Truncated Uniformization (MATU) approach that both removes bounded-score assumptions and preserves unbiased discrete score approximation. By exploiting the property that each token can be unmasked at most once, MATU attains a nearly $ε$-free complexity of $O(d\,\ln d\cdot (1-ε^2))$. This result surpasses existing uniformization methods under uniform discrete diffusion, eliminating the $\ln(1/ε)$ factor and substantially speeding up convergence. Our findings not only provide a rigorous theoretical foundation for masked discrete diffusion, showcasing its practical advantages over uniform diffusion for text generation, but also pave the way for future efforts to analyze diffusion-based language models developed under masking paradigm.

discrete diffusion, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.21835

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

9afe487de556e59e6db6c862adfe25a4-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 08:47:05 GMT

inequality, lemma, lemma 5, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Unbiased least squares regression via averaged stochastic gradient descent

Kahalé, Nabil

arXiv.org Machine LearningJun-26-2024

We consider an on-line least squares regression problem with optimal solution $\theta^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $\theta^*$. For $k\ge2$, we provide an unbiased estimator of $\theta^*$ that is a modification of the time-average estimator, runs with an expected number of time-steps of order k, with O(1/k) expected excess risk. The constant behind the O notation depends on parameters of the regression and is a poly-logarithmic function of the smallest eigenvalue of H. We provide both a biased and unbiased estimator of the expected excess risk of the time-average estimator and of its unbiased counterpart, without requiring knowledge of either H or $\theta^*$. We describe an "average-start" version of our estimators with similar properties. Our approach is based on randomized multilevel Monte Carlo. Our numerical experiments confirm our theoretical findings.

equation follow, estimator, unbiased estimator, (15 more...)

arXiv.org Machine Learning

2406.18623

Country:

North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Continuous Treatment Effects with Surrogate Outcomes

Zeng, Zhenghao, Arbour, David, Feller, Avi, Addanki, Raghavendra, Rossi, Ryan, Sinha, Ritwik, Kennedy, Edward H.

arXiv.org Artificial IntelligenceJan-31-2024

In many causal inference applications, the primary outcomes are missing for a non-trivial number of observations. For instance, in studies on long-term health effects of medical interventions, some measurements require expensive testing and a loss to follow-up is common (Hogan et al., 2004). In evaluating commercial online ad effectiveness, some individuals may drop out from the panel because they use multiple devices (Shankar et al., 2023), leading to missing revenue measures. In many of these studies, however, there often exist short-term outcomes that are easier and faster to measure, e.g., short-term health measures or an online ad's click-through rate, that are observed for a greater share of the sample. These outcomes, which are typically informative about the primary outcomes themselves, are refered to as surrogate outcomes or surrogates. There is a rich causal inference literature addressing missing outcome data. Simply restricting to data with observed primary outcomes may induce strong bias (Hernán and Robins, 2010). Ignoring unlabeled data also reduces the effective sample size for estimating the treatment effects and inflates the variance. Chakrabortty et al. (2022) considered the missing completely at random (MCAR) setting and showed that incorporating unlabeled data reduces variance.

doubly robust estimator, estimation, estimator, (14 more...)

arXiv.org Artificial Intelligence

2402.00168

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (0.68)
Education (0.68)
Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Long-term Causal Inference Under Persistent Confounding via Data Combination

Imbens, Guido, Kallus, Nathan, Mao, Xiaojie, Wang, Yuhao

arXiv.org Machine LearningFeb-15-2022

We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some short-term outcomes. In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i.e., some unmeasured confounders that can simultaneously affect the treatment, short-term outcomes and the long-term outcome, noting that they invalidate identification strategies in previous literature. To address this challenge, we exploit the sequential structure of multiple short-term outcomes, and develop three novel identification strategies for the average long-term treatment effect. We further propose three corresponding estimators and prove their asymptotic consistency and asymptotic normality. We finally apply our methods to estimate the effect of a job training program on long-term employment using semi-synthetic data. We numerically show that our proposals outperform existing methods that fail to handle persistent confounders.

bridge function, equation, estimator, (17 more...)

arXiv.org Machine Learning

2202.07234

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > California > Alameda County > Alameda (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Least-squares regressions via randomized Hessians

Kahale, Nabil

arXiv.org Machine LearningJun-1-2020

The recent availability of massive volumes of data fosters the need to design computationally efficient algorithms for optimization in high dimensions. In large-scale machine learning, stochastic gradient descent algorithms are among the most effective optimization methods (Bottou, Curtis and Nocedal 2018). For general smooth convex functions, averaged SGD achieves the rate of convergence of O(1/ k) after k iterations (Nemirovski, Juditsky, Lan and Shapiro 2009). For strongly-convex functions, i.e. when the smallest eigenvalue of the Hessian matrix is bounded away from 0, the convergence rate after k iterations is O(1/k) (Nemirovski, Juditsky, Lan and Shapiro 2009). Variance-reduced SGD algorithms that optimize the sum of n convex functions are described in (Schmidt, Le Roux and Bach 2017, Shalev-Shwartz and Zhang 2013, Johnson and Zhang 2013), and related accelerated methods are analysed in (Shalev-Shwartz and Zhang 2014, Nitanda 2014, Lan and Zhou 2018, Scieur, dAspremont and Bach 2018). These methods enjoy linear convergence(a convergence rate that decreases exponentially with the number of iterations) in the strongly-convex case. For general smooth convex functions, the stochastic average gradient method (SAG) of Schmidt, Le Roux and Bach (2017) yields a convergence rate of O( n/k) after k iterations. This paper focuses on the least-squares regression, which often arises in scientific computing and data analysis, and is widely used for inference and prediction. Many of the modern machine learning techniques such as the logistic and ridge regressions, the lasso method and neural networks can be considered as extensions of the least-squares regression technique.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2006.01017

Country:

North America > United States > New York (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback