AITopics | data generation

Collaborating Authors

data generation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Machine Learning for Network Attacks Classification and Statistical Evaluation of Adversarial Learning Methodologies for Synthetic Data Generation

Zarkadis, Iakovos-Christos, Douligeris, Christos

arXiv.org Machine LearningApr-3-2026

Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component if we wish to protect our personal data, which are scattered across the web. In this paper, we address two tasks, in the first unified multi-modal NIDS dataset, which incorporates flow-level data, packet payload information and temporal contextual features, from the reprocessed CIC-IDS-2017, CIC-IoT-2023, UNSW-NB15 and CIC-DDoS-2019, with the same feature space. In the first task we use machine learning (ML) algorithms, with stratified cross validation, in order to prevent network attacks, with stability and reliability. In the second task we use adversarial learning algorithms to generate synthetic data, compare them with the real ones and evaluate their fidelity, utility and privacy using the SDV framework, f-divergences, distinguishability and non-parametric statistical tests. The findings provide stable ML models for intrusion detection and generative models with high fidelity and utility, by combining the Synthetic Data Vault framework, the TRTS and TSTR tests, with non-parametric statistical tests and f-divergence measures.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2603.17717

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

Kuo, En-Ya, Motsch, Sebastien

arXiv.org Machine LearningMar-17-2026

Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2603.13566

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

aec33ab89b5986605cd7c331396e7e5c-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-17-2026, 11:21:11 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

ae9500c4f5607caf2eff033c67daa9d7-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-16-2026, 13:18:55 GMT

chatgpt, dataset, simprompt, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > District of Columbia > Washington (0.04)
North America > Mexico (0.04)
Africa (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.92)
Personal (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
Law (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Generative Utility of Cyclic Conditionals Chang Liu 1, Haoyue T ang 2, Tao Qin

Neural Information Processing SystemsFeb-12-2026, 01:37:04 GMT

Work done during an internship at Microsoft Research Asia.

artificial intelligence, international conference, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
(8 more...)

Add feedback

ImplicitGenerativeCopulas

Neural Information Processing SystemsFeb-11-2026, 11:05:01 GMT

Many approaches are available to reliably model univariate distributions of real-valued variables.

artificial intelligence, copula, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

EPIC: Effective Prompting for Imbalanced-Class Data Synthesis in Tabular Data Classification via Large Language Models

Neural Information Processing SystemsFeb-11-2026, 04:52:31 GMT

Our source code for our work is available here.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Daejeon > Daejeon (0.04)
North America > United States > California (0.04)
North America > Nicaragua (0.04)
North America > Montserrat (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ba9fab001f67381e56e410575874d967-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 22:05:49 GMT

decaf, fairness, synthetic data, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas > Tarrant County > Fort Worth (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.68)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

872f0e04ef95be7970d9a9d74b198fdf-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-10-2026, 12:30:08 GMT

Hence, extensiveresearch has been proposed to improvethisbound through richer distributions [55,53,36,13].

artificial intelligence, autoencoder, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > France (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

7 Checklist

Neural Information Processing SystemsFeb-10-2026, 02:20:06 GMT

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] We release the code and the models If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [Y es] We included the instructions given to participants in appendix F. In this appendix, we describe the neural network architecture used for our agents.Figure 2: Transformer encoder (left) used in both policy proposal network (center) and value network (right). Our model architecture is shown in Figure 2. It is essentially identical to the architecture in [11], except that it replaces the specialized graph-convolution-based encoder with a much simpler transformer encoder, removes all dropout layers, and uses separate policy and value networks. Aside from the encoder, the other aspects of the architecture are the same, notably the LSTM policy decoder, which decodes orders through sequential attention over each successive location in the encoder output to produce an action. The input to our new encoder is also identical to that of [11], consisting of the same representation of the current board state, previous board state, and a recent order embedding. Rather than processing various parts of this input in two parallel trunks before combining them into a shared encoder trunk, we take the simpler approach of concatenating all features together at the start, resulting in 146 feature channels across each of 81 board locations (75 region + 6 coasts). We pass this through a linear layer, add pointwise a learnable per-position per-channel bias, and then pass this to a standard transformer encoder architecture.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: