AITopics | real data distribution

Collaborating Authors

real data distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ba2e53000899e45e6018f639cb7469fa-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 18:40:50 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Queensland (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Distribution-Aware Data Expansion with Diffusion Models

Neural Information Processing SystemsOct-10-2025, 14:49:58 GMT

However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models.

arxiv preprint arxiv, dataset, diffusion model, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Queensland (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

8a50bae297807da9e97722a0b3fd8f27-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 04:18:48 GMT

artificial intelligence, classifier, machine learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

A Proofs

Neural Information Processing SystemsAug-15-2025, 01:01:39 GMT

GANs, we need to rewrite the objective functions that are easy to calculate derivatives. Proposition 2. F or any continuous and differentiable function f whose domain is X, we have: E Readers are encouraged to refer to the original proof in [57] for more details. Theorem 2. Given the optimal classifier Please see Appendix A.2 for details. Proposition 1. F or any fixed generator, given a data Theorem 3. The objective function for the generator of SSGAN-LA, given the optimal label-augmented discriminator, boils down to: min Theorem 4. At the equilibrium point of DAGAN, the optimal generator implies We first prove the first sentence in this Theorem. We then prove the second sentence in this Theorem.

discriminator, generator, self-supervised task, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6cb5da3513bd26085ee3fad631ebb37a-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 01:01:37 GMT

data distribution, discriminator, generator, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

Weak-to-Strong Diffusion with Reflection

Bai, Lichen, Sugiyama, Masashi, Xie, Zeke

arXiv.org Artificial IntelligenceFeb-5-2025

The goal of diffusion generative models is to align the learned distribution with the real data distribution through gradient score matching. However, inherent limitations in training data quality, modeling strategies, and architectural design lead to inevitable gap between generated outputs and real data. To reduce this gap, we propose Weak-to-Strong Diffusion (W2SD), a novel framework that utilizes the estimated difference between existing weak and strong models (i.e., weak-to-strong difference) to approximate the gap between an ideal model and a strong model. By employing a reflective operation that alternates between denoising and inversion with weak-to-strong difference, we theoretically understand that W2SD steers latent variables along sampling trajectories toward regions of the real data distribution. W2SD is highly flexible and broadly applicable, enabling diverse improvements through the strategic selection of weak-to-strong model pairs (e.g., DreamShaper vs. SD1.5, good experts vs. bad experts in MoE). Extensive experiments demonstrate that W2SD significantly improves human preference, aesthetic quality, and prompt adherence, achieving SOTA performance across various modalities (e.g., image, video), architectures (e.g., UNet-based, DiT-based, MoE), and benchmarks. For example, Juggernaut-XL with W2SD can improve with the HPSv2 winning rate up to 90% over the original results. Moreover, the performance gains achieved by W2SD markedly outweigh its additional computational overhead, while the cumulative improvements from different weak-to-strong difference further solidify its practical utility and deployability.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.00473

Country:

Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models

Zhou, Ying, Wang, Xinyao, Niu, Yulei, Shen, Yaojie, Tang, Lexin, Chen, Fan, He, Ben, Sun, Le, Wen, Longyin

arXiv.org Artificial IntelligenceNov-5-2024

Recent advancements in large language models (LLMs) have significantly enhanced their knowledge and generative capabilities, leading to a surge of interest in leveraging LLMs for high-quality data synthesis. However, synthetic data generation via prompting LLMs remains challenging due to LLMs' limited understanding of target data distributions and the complexity of prompt engineering, especially for structured formatted data. To address these issues, we introduce DiffLM, a controllable data synthesis framework based on variational autoencoder (VAE), which further (1) leverages diffusion models to reserve more information of original distribution and format structure in the learned latent distribution and (2) decouples the learning of target distribution knowledge from the LLM's generative objectives via a plug-and-play latent feature injection module. As we observed significant discrepancies between the VAE's latent representations and the real data distribution, the latent diffusion module is introduced into our framework to learn a fully expressive latent distribution. Evaluations on seven real-world datasets with structured formatted data (i.e., Tabular, Code and Tool data) demonstrate that DiffLM generates high-quality data, with performance on downstream tasks surpassing that of real data by 2%-7% in certain cases. The data and code will be publicly available upon completion of internal review. Data Synthesis has become an indispensable technique in current machine learning research, enabling rapid generation and modification of datasets (Bauer et al., 2024), allowing researchers to experiment with various scenarios and model architectures without the extensive processes associated with real-world data collection. Meanwhile, with the rapid advancements in large language models (LLMs), recent research in natural language processing (NLP) has increasingly focused on leveraging LLMs for synthetic data generation. Early efforts attempted to fine-tune LLMs to align with real data distributions (Keskar et al., 2019; Anaby-Tavor et al., 2020; Borisov et al., 2023). As the in-context learning capabilities of LLMs have improved, some studies have explored zero-shot or few-shot prompting of LLMs to generate synthetic data (Ye et al., 2022a; Wei et al., 2024).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.0325

Country: