AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.57)

Neural Information Processing SystemsNov-20-2025, 22:13:18 GMT

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

learning overparameterized neural network, name change, stochastic gradient descent, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Kumar, Arlen, Palkhouski, Leanid

AI Answer Engine Citation Behavior An Empirical Analysis of the GEO16 Framework

arXiv.org Artificial IntelligenceSep-16-2025

AI answer engines increasingly mediate access to domain knowledge by generating responses and citing web sources. We introduce GEO-16, a 16 pillar auditing framework that converts on page quality signals into banded pillar scores and a normalized GEO score G that ranges from 0 to 1. Using 70 product intent prompts, we collected 1,702 citations across three engines (Brave Summary, Google AI Overviews, and Perplexity) and audited 1,100 unique URLs. In our corpus, the engines differed in the GEO quality of the pages they cited, and pillars related to Metadata and Freshness, Semantic HTML, and Structured Data showed the strongest associations with citation. Logistic models with domain clustered standard errors indicate that overall page quality is a strong predictor of citation, and simple operating points (for example, G at least 0.70 combined with at least 12 pillar hits) align with substantially higher citation rates in our data. We report per engine contrasts, vertical effects, threshold analysis, and diagnostics, then translate findings into a practical playbook for publishers. The study is observational and focuses on English language B2B SaaS pages; we discuss limitations, threats to validity, and reproducibility considerations.

engine, large language model, natural language, (14 more...)

2509.10762

Country: North America > United States > California (0.15)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.69)

Industry: Information Technology (0.50)

Technology:

Information Technology > Information Management (0.91)
Information Technology > Communications > Web (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Neural Information Processing SystemsMay-27-2025, 13:16:47 GMT

Gradient-Based Feature Learning under Structured Data

Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.

artificial intelligence, gradient-based feature learning, machine learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

arXiv.org Artificial IntelligenceMar-26-2025

Assessing Generative Models for Structured Data

Cannon, Reilly, Laird, Nicolette M., Vazquez, Caesar, Lin, Andy, Wagler, Amy, Chiang, Tony

Synthetic tabular data generation has emerged as a promising method to address limited data availability and privacy concerns. With the sharp increase in the performance of large language models in recent years, researchers have been interested in applying these models to the generation of tabular data. However, little is known about the quality of the generated tabular data from large language models. The predominant method for assessing the quality of synthetic tabular data is the train-synthetic-test-real approach, where the artificial examples are compared to the original by how well machine learning models, trained separately on the real and synthetic sets, perform in some downstream tasks. This method does not directly measure how closely the distribution of generated data approximates that of the original. This paper introduces rigorous methods for directly assessing synthetic tabular data against real data by looking at inter-column dependencies within the data. We find that large language models (GPT-2), both when queried via few-shot prompting and when fine-tuned, and GAN (CTGAN) models do not produce data with dependencies that mirror the original real data. Results from this study can inform future practice in synthetic data generation to improve data quality.

generative model, large language model, natural language, (2 more...)

2503.20903

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Neural Information Processing SystemsJan-20-2025, 04:23:23 GMT

Reviews: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

This paper studies learning over-parametrized single hidden layer ReLU neural networks for multi-class classification via SGD and the corresponding generalization error. They consider a mixture data distribution where each class has well-separated and compact support. The authors show SGD applied on the considered learning model achieves good prediction error with high probability under suitable assumptions. As a result even in severely over-parametrized models, SGD can generalize well although the network has enough capacity to fit arbitrary labels. The main insight in the theoretical analysis appears to be the observation that in the over-parametrized case, many ReLU neurons don't change their activation pattern when initialized randomly.

learning overparameterized neural network, stochastic gradient descent, structured data, (4 more...)

Genre: Research Report (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Neural Information Processing SystemsJan-20-2025, 00:47:24 GMT

Gradient-Based Feature Learning under Structured Data

Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.

gradient-based feature learning, sample complexity, structured data, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

arXiv.org Artificial IntelligenceOct-17-2024

Representation Learning of Structured Data for Medical Foundation Models

Dwivedi, Vijay Prakash, Schlegel, Viktor, Liu, Andy T., Nguyen, Thanh-Tung, Kashyap, Abhinav Ramesh, Wei, Jeng, Yin, Wei-Hsian, Winkler, Stefan, Tan, Robby T.

Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text.

large language model, machine learning, medical code, (17 more...)

2410.13351

Country:

Asia > Singapore (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Asia > Taiwan (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kögler, Kevin, Shevchenko, Alexander, Hassani, Hamed, Mondelli, Marco

Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

arXiv.org Artificial IntelligenceFeb-7-2024

Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow autoencoder capture the structure of the underlying data distribution? For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. Namely, the performance of the algorithm is the same as if it was compressing a Gaussian source - with no sparsity. For general data distributions, we give evidence of a phase transition phenomenon in the shape of the gradient descent minimizer, as a function of the data sparsity: below the critical sparsity level, the minimizer is a rotation taken uniformly at random (just like in the compression of non-sparse data); above the critical sparsity, the minimizer is the identity (up to a permutation). Finally, by exploiting a connection with approximate message passing algorithms, we show how to improve upon Gaussian performance for the compression of sparse data: adding a denoising function to a shallow architecture already reduces the loss provably, and a suitable multi-layer decoder leads to a further improvement. We validate our findings on image datasets, such as CIFAR-10 and MNIST.

compression, lemma, provable benefit, (15 more...)

2402.05013

Country:

North America > United States > Pennsylvania (0.04)
Europe > Austria (0.04)

Genre: Research Report > New Finding (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

arXiv.org Artificial IntelligenceMay-13-2023

Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data

Song, Zhao, Ye, Mingquan

Deep learning has achieved impressive success in a variety of fields because of its good generalization. However, it has been a challenging problem to quickly train a neural network with a large number of layers. The existing works utilize the locality-sensitive hashing technique or some data structures on space partitioning to alleviate the training cost in each iteration. In this work, we try accelerating the computations in each iteration from the perspective of input data points. Specifically, for a two-layer fully connected neural network, when the training data have some special properties, e.g., Kronecker structure, each iteration can be completed in sublinear time in the data dimension.

artificial intelligence, efficient asynchronize stochastic gradient algorithm, machine learning, (1 more...)

2305.08001

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)