AITopics

doi: 10.18653/v1/2025.iwslt-1.37

2505.02518

Country:

Europe (1.00)
North America > United States (0.29)
Asia > Indonesia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Kazemi, Arefeh, Kalaivendan, Sri Balaaji Natarajan, Wagner, Joachim, Qadeer, Hamza, Davis, Brian

Synthetic vs. Gold: The Role of LLM-Generated Labels and Data in Cyberbullying Detection

arXiv.org Artificial IntelligenceFeb-21-2025

This study investigates the role of LLM-generated synthetic data in cyberbullying detection. We conduct a series of experiments where we replace some or all of the authentic data with synthetic data, or augment the authentic data with synthetic data. We find that synthetic cyberbullying data can be the basis for training a classifier for harm detection that reaches performance close to that of a classifier trained with authentic data. Combining authentic with synthetic data shows improvements over the baseline of training on authentic data alone for the test data for all three LLMs tried. These results highlight the viability of synthetic data as a scalable, ethically viable alternative in cyberbullying detection while emphasizing the critical impact of LLM selection on performance outcomes.

authentic data, dataset, synthetic data, (16 more...)

2502.1586

Country:

North America > United States (0.04)
North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Marchesi, Raffaele, Micheletti, Nicolo, Jurman, Giuseppe, Osmani, Venet

Mitigating Health Data Poverty: Generative Approaches versus Resampling for Time-series Clinical Data

arXiv.org Artificial IntelligenceOct-26-2022

Several approaches have been developed to mitigate algorithmic bias stemming from health data poverty, where minority groups are underrepresented in training datasets. Augmenting the minority class using resampling (such as SMOTE) is a widely used approach due to the simplicity of the algorithms. However, these algorithms decrease data variability and may introduce correlations between samples, giving rise to the use of generative approaches based on GAN. Generation of high-dimensional, time-series, authentic data that provides a wide distribution coverage of the real data, remains a challenging task for both resampling and GAN-based approaches. In this work we propose CA-GAN architecture that addresses some of the shortcomings of the current approaches, where we provide a detailed comparison with both SMOTE and WGAN-GP*, using a high-dimensional, time-series, real dataset of 3343 hypotensive Caucasian and Black patients. We show that our approach is better at both generating authentic data of the minority class and remaining within the original distribution of the real data.

artificial intelligence, bioinformatics, machine learning, (19 more...)

2210.13958

Country:

Africa > Malawi (0.04)
North America > United States (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)

Genre: Research Report > Experimental Study (0.40)

Industry: Health & Medicine > Diagnostic Medicine (0.64)

Technology:

Information Technology > Biomedical Informatics > Clinical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Robinson, Nathaniel, Ogayo, Perez, Gangu, Swetha, Mortensen, David R., Watanabe, Shinji

When Is TTS Augmentation Through a Pivot Language Useful?

arXiv.org Artificial IntelligenceJul-20-2022

Developing Automatic Speech Recognition (ASR) for low-resource languages is a challenge due to the small amount of transcribed audio data. For many such languages, audio and text are available separately, but not audio with transcriptions. Using text, speech can be synthetically produced via text-to-speech (TTS) systems. However, many low-resource languages do not have quality TTS systems either. We propose an alternative: produce synthetic audio by running text from the target language through a trained TTS system for a higher-resource pivot language. We investigate when and how this technique is most effective in low-resource settings. In our experiments, using several thousand synthetic TTS text-speech pairs and duplicating authentic data to balance yields optimal results. Our findings suggest that searching over a set of candidate pivot languages can lead to marginal improvements and that, surprisingly, ASR performance can by harmed by increases in measured TTS quality. Application of these findings improves ASR by 64.5\% and 45.0\% character error reduction rate (CERR) respectively for two low-resource languages: Guaran\'i and Suba.

experiment, kiswahili, pivot language, (15 more...)

2207.09889

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Finland > Uusimaa > Helsinki (0.05)
South America > Paraguay (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

#artificialintelligenceOct-5-2021, 13:10:43 GMT

Can Synthetic Data Make AI Better? Discover the Benefits of Synthetic Data

Although artificial intelligence (AI) is getting more advanced due to an exponential rate of development, limitations to this modern technology still exist. So, can synthetic data be the solution for all AI-related concerns? In the fourth industrial revolution, every industry sector has discovered the potential of modern technologies; such as artificial intelligence (AI) and machine learning (ML). Almost every other organization is deploying AI to create more efficient business processes and to ensure better customer satisfaction. But, startups, SOHOs, and small and medium businesses (SMBs) face a major issue while adopting AI- it's called the cold start problem.

ai system, application, synthetic data, (14 more...)

Country: North America > United States > Minnesota (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

#artificialintelligenceSep-27-2021, 09:25:43 GMT

Discovering the Benefits of Synthetic Data

Although artificial intelligence (A)I is getting more advanced due to an exponential rate of development, limitations to this modern technology still exist. So, can synthetic data be the solution for all AI-related concerns? In the fourth industrial revolution, every industry sector has discovered the potential of modern technologies; such as AI and ML. Almost every other organization is deploying AI to create more efficient business processes and to ensure better customer satisfaction. But, startups, SOHOs, and small and medium businesses (SMBs) face a major issue while adopting AI- it's called the cold start problem.

ai system, application, synthetic data, (14 more...)

Country: North America > United States > Minnesota (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (0.34)
Information Technology > Artificial Intelligence > Machine Learning (0.30)

#artificialintelligenceAug-14-2021, 02:55:38 GMT

EETimes - What Is Synthetic Data and Why Is It Critical for the Future of AI?

Advanced AI development today is still deeply rooted in 1950s computer science philosophies, including the phrase "garbage in, garbage out." The adage reminds us that an AI model is only as good as the data it's trained on. For everything from advanced cancer screenings to suggesting a new movie, data scientists need large and diverse datasets to train AI models. This can be a significant challenge with real-world data. Often protected for privacy reasons, authentic data can be hard to come by and can also be expensive to source, and potentially not as diverse as desired.

ai model, simulation, synthetic data, (14 more...)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.50)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.42)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.34)

#artificialintelligenceNov-7-2018, 02:16:08 GMT

Does Synthetic Data Hold The Secret To Artificial Intelligence?

Could synthetic data be the solution to rapidly train artificial intelligence (AI) algorithms? There are advantages and disadvantages to synthetic data; however, many technology experts believe that synthetic data is the key to democratizing machine learning and to accelerate testing and adoption of artificial intelligence algorithms into our daily lives. When a computer artificially manufactures data rather than measures and collects it from real-world situations it's called synthetic data. The data is anonymized and created based on the user-specified parameters so that it's as close as possible to the properties of data from real-world scenarios. One way to create synthetic data is to use real-world data but strip the identifying aspects such as names, emails, social security numbers and addresses from the data set so that it is anonymized.

artificial intelligence, machine learning, synthetic data, (9 more...)

Genre: Research Report (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)