LLM aided semi-supervision for Extractive Dialog Summarization
Mishra, Nishant, Sahu, Gaurav, Calixto, Iacer, Abu-Hanna, Ameen, Laradji, Issam H.
–arXiv.org Artificial Intelligence
Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.
arXiv.org Artificial Intelligence
Nov-23-2023
- Country:
- Asia > China
- Hong Kong (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Germany > Berlin (0.04)
- Netherlands > North Holland
- Amsterdam (0.05)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > British Columbia (0.04)
- Dominican Republic (0.04)
- United States
- New York > New York County
- New York City (0.04)
- Texas (0.04)
- New York > New York County
- Oceania > Australia
- Asia > China
- Genre:
- Research Report (0.82)
- Technology: