Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis

Bhattad, Payal, Dinakarrao, Sai Manoj Pudukotai, Gupta, Anju

Jul-17-2025–arXiv.org Artificial Intelligence

Text data augmentation is a widely used strategy for mitigating data sparsity in natural language processing (NLP), particularly in low-resource settings where limited samples hinder effective semantic modeling. While augmentation can improve input diversity and downstream interpretability, existing techniques often lack mechanisms to ensure semantic preservation during large -scale or iterative generation, leadin g to redundancy and instability. This work introduces a principled evaluation framework for large language model (LLM) based text augmentation, comprising two components: (1) Scalability Analysis, which measures semantic consistency as augmentation volume increases, and (2) Iterative Augmentation with Summarization Refinement (IASR), which evaluates semantic drift across recursive paraphrasing cycles. Empirical evaluations across state -of-the -art LLMs show that GPT-3.5 Turbo achieve d the best balance of semantic fidelity, diversity, and generation efficiency. Applied to a real -world topic modeling task using BERTopic with GPT-enhanced few -shot labeling, the proposed approach results in a 400% increase in topic granularity and complete elimination of topic overlaps. These findings validate d the utility of the proposed frameworks for structured evaluation of LLM -based augmentation in practical NLP pipelines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-17-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - India (0.04)
  - Middle East > Jordan (0.04)
  - Singapore (0.04)
- Europe > Switzerland (0.04)
- North America > United States
  - Florida > Miami-Dade County
    - Miami (0.04)
  - Ohio > Lucas County
    - Toledo (0.14)
  - Virginia > Fairfax County
    - Fairfax (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Health & Medicine (0.93)
- Information Technology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found