Samei, Borhan (University of Memphis) | Estiagh, Marzieh (Shiraz University, Shiraz, Iran) | Eshtiagh, Marzieh (Southeast Missouri State University) | Keshtkar, Fazel (Shiraz University) | Hashemi, Sattar (Shiraz University, Shiraz, Iran)
Text summarization is an important field in the area of natural language processing and text mining. This paper proposes an extraction-based model which uses graph-based and information theoretic concepts for multi-document summarization. Our method constructs a directed weighted graph from the original text by adding a vertex for each sentence, and compute a weighted edge between sentences which is based on distortion measures. In this paper we proposed a combination of these two models by representing the input as a graph, using distortion measures as the weight function and a ranking algorithm. Finally, a ranking algorithm is applied to identify the most important sentences to be included in the summary. By defining a proper distortion measure and ranking algorithm, this model gains promising results on the DUC2002 which is a well known real world data set. The results and ROUGE-1 scores of our model is fairly close to other successful models.
Forum threads are lengthy and rich in content. Concise thread summaries will benefit both newcomers seeking information and those who participate in the discussion. Few studies, however, have examined the task of forum thread summarization. In this work we make the first attempt to adapt the hierarchical attention networks for thread summarization. The model draws on the recent development of neural attention mechanisms to build sentence and thread representations and use them for summarization. Our results indicate that the proposed approach can outperform a range of competitive baselines. Further, a redundancy removal step is crucial for achieving outstanding results.
Microblogging sites, such as Twitter, have become increasingly popular in recent years for reporting details of real world events via the Web. Smartphone apps enable people to communicate with a global audience to express their opinion and commentate on ongoing situations - often while geographically proximal to the event. Due to the heterogeneity and scale of the data and the fact that some messages are more salient than others for the purposes of understanding any risk to human safety and managing any disruption caused by events, automatic summarization of event-related microblogs is a non-trivial and important problem. In this paper we tackle the task of automatic summarization of Twitter posts, and present three methods that produce summaries by selecting the most representative posts from real-world tweet-event clusters. To evaluate our approaches, we compare them to the state-of-the-art summarization systems and human generated summaries. Our results show that our proposed methods outperform all the other summarization systems for English and non-English corpora.
This function will take the sentence scores we generated above as well as a value for the top k highest scoring sentences to sue for summarization. It will return a string summary of the concatenated top sentences, as well as the sentence scores of the sentences used in the summarization. Let's use the function to generate the summary. And let's check out the summary sentence scores for good measure. The summary seems reasonable at a quick pass, given the text of the article. Try out this simple method on some other text for further evidence.
"Aren't you two ever going to read Hogwarts, A History?" How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it's a lot. How many nights do the three of them spend in the library, reading through every book they can find to figure out who Nicolas Flamel is, or how to survive underwater, or preparing for their O.W.L.s? The mistake they're all making is to try to read everything themselves. Remember when you were in school and stumbled upon the CliffsNotes summary of that book you never read but were supposed to write an essay about? That's basically what text summarization does: provide the CliffsNotes version for any large document.