AITopics | diam 2

Collaborating Authors

diam 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Local Averaging Accurately Distills Manifold Structure From Noisy Data

Shen, Yihan, Wang, Shiyu, Lamy, Arnaud, Avagyan, Mariam, Wright, John

arXiv.org Machine LearningJun-24-2025

High-dimensional data are ubiquitous, with examples ranging from natural images to scientific datasets, and often reside near low-dimensional manifolds. Leveraging this geometric structure is vital for downstream tasks, including signal denoising, reconstruction, and generation. However, in practice, the manifold is typically unknown and only noisy samples are available. A fundamental approach to uncovering the manifold structure is local averaging, which is a cornerstone of state-of-the-art provable methods for manifold fitting and denoising. However, to the best of our knowledge, there are no works that rigorously analyze the accuracy of local averaging in a manifold setting in high-noise regimes. In this work, we provide theoretical analyses of a two-round mini-batch local averaging method applied to noisy samples drawn from a $d$-dimensional manifold $\mathcal M \subset \mathbb{R}^D$, under a relatively high-noise regime where the noise size is comparable to the reach $τ$. We show that with high probability, the averaged point $\hat{\mathbf q}$ achieves the bound $d(\hat{\mathbf q}, \mathcal M) \leq σ\sqrt{d\left(1+\frac{κ\mathrm{diam}(\mathcal {M})}{\log(D)}\right)}$, where $σ, \mathrm{diam(\mathcal M)},κ$ denote the standard deviation of the Gaussian noise, manifold's diameter and a bound on its extrinsic curvature, respectively. This is the first analysis of local averaging accuracy over the manifold in the relatively high noise regime where $σ\sqrt{D} \approx τ$. The proposed method can serve as a preprocessing step for a wide range of provable methods designed for lower-noise regimes. Additionally, our framework can provide a theoretical foundation for a broad spectrum of denoising and dimensionality reduction methods that rely on local averaging techniques.

artificial intelligence, diam, machine learning, (17 more...)

arXiv.org Machine Learning

2506.18761

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Adaptive Split Balancing for Optimal Random Forest

Zhang, Yuqian, Ji, Weijie, Bradic, Jelena

arXiv.org Machine LearningFeb-17-2024

While random forests are commonly used for regression problems, existing methods often lack adaptability in complex situations or lose optimality under simple, smooth scenarios. In this study, we introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data while simultaneously achieving minimax optimality under the Lipschitz class. To exploit higher-order smoothness levels, we further propose a localized version that attains the minimax rate under the H\"older class $\mathcal{H}^{q,\beta}$ for any $q\in\mathbb{N}$ and $\beta\in(0,1]$. Rather than relying on the widely-used random feature selection, we consider a balanced modification to existing approaches. Our results indicate that an over-reliance on auxiliary randomness may compromise the approximation power of tree models, leading to suboptimal results. Conversely, a less random, more balanced approach demonstrates optimality. Additionally, we establish uniform upper bounds and explore the application of random forests in average treatment effect estimation problems. Through simulation studies and real-data applications, we demonstrate the superior empirical performance of the proposed methods over existing random forests.

adaptive split, random forest, splitting direction, (16 more...)

arXiv.org Machine Learning

2402.11228

Country:

Asia > China (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Boosting Gradient Ascent for Continuous DR-submodular Maximization

Zhang, Qixin, Wan, Zongqi, Deng, Zengde, Chen, Zaiyi, Sun, Xiaoming, Zhang, Jialin, Yang, Yu

arXiv.org Artificial IntelligenceJan-16-2024

Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficiently improve the approximation guarantee of the standard PGA to \emph{optimal} with only small modifications on the objective function. The fundamental idea of our boosting technique is to exploit non-oblivious search to derive a novel auxiliary function $F$, whose stationary points are excellent approximations to the global maximum of the original DR-submodular objective $f$. Specifically, when $f$ is monotone and $\gamma$-weakly DR-submodular, we propose an auxiliary function $F$ whose stationary points can provide a better $(1-e^{-\gamma})$-approximation than the $(\gamma^2/(1+\gamma^2))$-approximation guaranteed by the stationary points of $f$ itself. Similarly, for the non-monotone case, we devise another auxiliary function $F$ whose stationary points can achieve an optimal $\frac{1-\min_{\boldsymbol{x}\in\mathcal{C}}\|\boldsymbol{x}\|_{\infty}}{4}$-approximation guarantee where $\mathcal{C}$ is a convex constraint set. In contrast, the stationary points of the original non-monotone DR-submodular function can be arbitrarily bad~\citep{chen2023continuous}. Furthermore, we demonstrate the scalability of our boosting technique on four problems. In all of these four problems, our resulting variants of boosting PGA algorithm beat the previous standard PGA in several aspects such as approximation ratio and efficiency. Finally, we corroborate our theoretical findings with numerical experiments, which demonstrate the effectiveness of our boosting PGA methods.

algorithm, maximization, stationary point, (16 more...)

arXiv.org Artificial Intelligence

2401.0833

Country:

Asia > China > Hong Kong (0.04)
Africa > Sudan (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (0.45)
Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Private Adaptive Gradient Methods for Convex Optimization

Asi, Hilal, Duchi, John, Fallah, Alireza, Javidbakht, Omid, Talwar, Kunal

arXiv.org Machine LearningJun-25-2021

While the success of stochastic gradient methods for solving empirical risk minimization has motivated their adoption across much of machine learning, increasing privacy risks in data-intensive tasks have made applying them more challenging [DMNS06]: gradients can leak users' data, intermediate models can compromise individuals, and even final trained models may be non-private without substantial care. This motivates a growing line of work developing private variants of stochastic gradient descent (SGD), where algorithms guarantee differential privacy by perturbing individual gradients with random noise [DJW13; ST13b; ACGMMTZ16; DJW18; BFTT19; FKT20]. Yet these noise addition procedures typically fail to reflect the geometry underlying the optimization problem, which in non-private cases is essential: for high-dimensional problems with sparse parameters, mirror descent and its variants [BT03; NJLS09] are essential, while in the large-scale stochastic settings prevalent in deep learning, AdaGrad and other adaptive variants [DHS11] provide stronger theoretical and practical performance. Even more, methods that do not adapt (or do not leverage geometry) can be provably sub-optimal, in that there exist problems where their convergence is much slower than adaptive variants that reflect appropriate geometry [LD19]. To address these challenges, we introduce Pagan (Private AdaGrad with Adaptive Noise), a new differentially private variant of stochastic gradient descent and AdaGrad.

algorithm, diam 2, gradient, (13 more...)

arXiv.org Machine Learning

2106.13756

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Learning finite-dimensional coding schemes with nonlinear reconstruction maps

Lee, Jaeho, Raginsky, Maxim

arXiv.org Machine LearningDec-23-2018

The problem of lossy compression is about constructing succinct representations of high-dimensional random vectors that retain the features of the data that are relevant for some subsequent task, such as reconstruction subject to a fidelity criterion or statistical inference. When the compressed representation is digital, with constraints imposed by the limitations on the speed of digital transmission oron the available storage space, the corresponding problem of lossy compression falls within the purview of rate-distortion theory[6] and the theory of vector quantization[15]. On the other hand, given recent advances in machine learning using deep neural nets[17], it is of interest to consider'analog' schemes for lossy compression that map the original high-dimensional data to a continuous representation of lower dimensionality[5], and where the reconstruction operations that send the compressed representation back to the original high-dimensional space are implemented bynonlinear maps with a given structure.

artificial intelligence, machine learning, reconstruction map, (18 more...)

arXiv.org Machine Learning

1812.09658

Country:

North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Clustering with t-SNE, provably

Linderman, George C., Steinerberger, Stefan

arXiv.org Machine LearningJun-8-2017

t-distributed Stochastic Neighborhood Embedding (t-SNE), a clustering and visualization method proposed by van der Maaten & Hinton in 2008, has rapidly become a standard tool in a number of natural sciences. Despite its overwhelming success, there is a distinct lack of mathematical foundations and the inner workings of the algorithm are not well understood. The purpose of this paper is to prove that t-SNE is able to recover well-separated clusters; more precisely, we prove that t-SNE in the `early exaggeration' phase, an optimization technique proposed by van der Maaten & Hinton (2008) and van der Maaten (2014), can be rigorously analyzed. As a byproduct, the proof suggests novel ways for setting the exaggeration parameter $\alpha$ and step size $h$. Numerical examples illustrate the effectiveness of these rules: in particular, the quality of embedding of topological structures (e.g. the swiss roll) improves. We also discuss a connection to spectral clustering methods.

diam 2, early exaggeration phase, spectral method, (12 more...)

arXiv.org Machine Learning

1706.02582

Country: North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback