AITopics

2410.01183

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.05)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Gregório, Fabio, Castro, Rafaela, Belloze, Kele, Lopes, Rui Pedro, Bezerra, Eduardo

GLARE: Guided LexRank for Advanced Retrieval in Legal Analysis

arXiv.org Artificial IntelligenceSep-10-2024

The Brazilian Constitution, known as the Citizen's Charter, provides mechanisms for citizens to petition the Judiciary, including the so-called special appeal. This specific type of appeal aims to standardize the legal interpretation of Brazilian legislation in cases where the decision contradicts federal laws. The handling of special appeals is a daily task in the Judiciary, regularly presenting significant demands in its courts. We propose a new method called GLARE, based on unsupervised machine learning, to help the legal analyst classify a special appeal on a topic from a list made available by the National Court of Brazil (STJ). As part of this method, we propose a modification of the graph-based LexRank algorithm, which we call Guided LexRank. This algorithm generates the summary of a special appeal. The degree of similarity between the generated summary and different topics is evaluated using the BM25 algorithm. As a result, the method presents a ranking of themes most appropriate to the analyzed special appeal. The proposed method does not require prior labeling of the text to be evaluated and eliminates the need for large volumes of data to train a model. We evaluate the effectiveness of the method by applying it to a special appeal corpus previously classified by human experts.

2409.15348

Country:

North America > Canada (0.14)
Asia > Singapore (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Statutes (0.88)
Law > Government & the Courts (0.75)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Sakib, Md Nazmus, Islam, Md Athikul, Arifin, Md Mashrur

Automatic Pull Request Description Generation Using LLMs: A T5 Model Approach

arXiv.org Artificial IntelligenceAug-1-2024

Developers create pull request (PR) descriptions to provide an overview of their changes and explain the motivations behind them. These descriptions help reviewers and fellow developers quickly understand the updates. Despite their importance, some developers omit these descriptions. To tackle this problem, we propose an automated method for generating PR descriptions based on commit messages and source code comments. This method frames the task as a text summarization problem, for which we utilized the T5 text-to-text transfer model. We fine-tuned a pre-trained T5 model using a dataset containing 33,466 PRs. The model's effectiveness was assessed using ROUGE metrics, which are recognized for their strong alignment with human evaluations. Our findings reveal that the T5 model significantly outperforms LexRank, which served as our baseline for comparison.

commit message, dataset, pr description, (13 more...)

2408.00921

Country:

North America > United States > Idaho > Ada County > Boise (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(2 more...)

Genre:

Overview (0.86)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Hsu, Ting-Yao, Suhara, Yoshi, Wang, Xiaolan

Summarizing Community-based Question-Answer Pairs

arXiv.org Artificial IntelligenceNov-17-2022

Community-based Question Answering (CQA), which allows users to acquire their desired information, has increasingly become an essential component of online services in various domains such as E-commerce, travel, and dining. However, an overwhelming number of CQA pairs makes it difficult for users without particular intent to find useful information spread over CQA pairs. To help users quickly digest the key information, we propose the novel CQA summarization task that aims to create a concise summary from CQA pairs. To this end, we first design a multi-stage data annotation process and create a benchmark dataset, CoQASUM, based on the Amazon QA corpus. We then compare a collection of extractive and abstractive summarization methods and establish a strong baseline approach DedupLED for the CQA summarization task. Our experiment further confirms two key challenges, sentence-type transfer and deduplication removal, towards the CQA summarization task. Our data and code are publicly available.

machine learning, natural language, question answering, (17 more...)

2211.09892

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Services > e-Commerce Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.58)

#artificialintelligenceNov-25-2019, 09:34:36 GMT

Text Summarization on the Books of Harry Potter

"Aren't you two ever going to read Hogwarts, A History?" How many times throughout the Harry Potter series does Hermione bug Harry and Ron to read the enormous tome Hogwarts, A History? Hint: it's a lot. How many nights do the three of them spend in the library, reading through every book they can find to figure out who Nicolas Flamel is, or how to survive underwater, or preparing for their O.W.L.s? The mistake they're all making is to try to read everything themselves. Remember when you were in school and stumbled upon the CliffsNotes summary of that book you never read but were supposed to write an essay about? That's basically what text summarization does: provide the CliffsNotes version for any large document.

algorithm, summarization, text summarization algorithm, (12 more...)

#artificialintelligence

Country: Europe > Ukraine > Sumy Oblast > Sumy (0.06)

Industry:

Media > Film (0.62)
Leisure & Entertainment (0.62)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.33)

Ribeiro, R., Martins de Matos, D.

Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity

Journal of Artificial Intelligence ResearchNov-4-2011

In automatic summarization, centrality-as-relevance means that the most important content of an information source, or a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms, and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domain-independent. Thorough automatic evaluation shows that the method achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, including when compared to considerably more complex approaches.

fractional, input source, summarization, (16 more...)

doi: 10.1613/jair.3387

AI Access Foundation

10727

Country:

Europe > Austria > Vienna (0.14)
Europe > Portugal > Lisbon > Lisbon (0.05)
South America > Brazil (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (0.69)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Journal of Artificial Intelligence ResearchDec-1-2004

LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

Erkan, G., Radev, D. R.

We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

centrality, lexrank, summarization, (13 more...)

doi: 10.1613/jair.1523

AI Access Foundation

10396

Country:

Europe > United Kingdom (0.28)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)
(15 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Military (0.93)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)