summary
A Appendix
A.1 Summary of Commonly Used Metrics for T ext Generation Table 1: Summary of commonly used metrics for text generation. For settings and tasks, we only list the ones justified by the original paper for each metric. We conduct experiments on WMT19, and the results are shown in Tab. 2. We don't observe A.3 Prompt Set In Tab. 3, we list the full prompt set for both s h direction and h r direction. Prompt Set s h Last Tersely Succinctly In summation To put it succinctly After In brief All in all To summarize Bringing up the rear Behind In short In outline In a nutshell To come to the point Lastly Concisely In closing In conclusion In the final analysis In sum In precis In passing In winding up Without wasting words To end In a word To conclude Last in order At the end of the day Curtly Compactly Summarising In a few words Without waste of words Crisply Summarily In the rear As a final point Finally yet importantly At last To sum up Summarizing Not least of all To put it in a nutshell Pithily Basically Laconically To put it briefly When all is said and done Shortly In the end At the rear Not to mince words To cut a long story short In fine At the end To be brief Last but not least Not to beat about the bush Finally In essence Last of all Just as importantly In drawing things to a close Briefly Ultimately Elliptically To put it concisely Not to put too fine a point on ith r As To wit As it were Case in point As an illustration sc. That is Especially That is to say To give an example i.e.
JurisTCU: A Brazilian Portuguese Information Retrieval Dataset with Query Relevance Judgments
Fernandes, Leandro Carísio, Ribeiro, Leandro dos Santos, de Castro, Marcos Vinícius Borela, Pacheco, Leonardo Augusto da Silva, Sandes, Edans Flávius de Oliveira
This paper introduces JurisTCU, a Brazilian Portuguese dataset for legal information retrieval (LIR). The dataset is freely available and consists of 16,045 jurisprudential documents from the Brazilian Federal Court of Accounts, along with 150 queries annotated with relevance judgments. It addresses the scarcity of Portuguese-language LIR datasets with query relevance annotations. The queries are organized into three groups: real user keyword-based queries, synthetic keyword-based queries, and synthetic question-based queries. Relevance judgments were produced through a hybrid approach combining LLM-based scoring with expert domain validation. We used JurisTCU in 14 experiments using lexical search (document expansion methods) and semantic search (BERT-based and OpenAI embeddings). We show that the document expansion methods significantly improve the performance of standard BM25 search on this dataset, with improvements exceeding 45% in P@10, R@10, and nDCG@10 metrics when evaluating short keyword-based queries. Among the embedding models, the OpenAI models produced the best results, with improvements of approximately 70% in P@10, R@10, and nDCG@10 metrics for short keyword-based queries, suggesting that these dense embeddings capture semantic relationships in this domain, surpassing the reliance on lexical terms. Besides offering a dataset for the Portuguese-language IR research community, suitable for evaluating search systems, the results also contribute to enhancing a search system highly relevant to Brazilian citizens.
- South America > Brazil (0.28)
- Europe (0.28)
- North America > United States (0.14)
- Law (1.00)
- Government (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Instructive Dialogue Summarization with Query Aggregations
Wang, Bin, Liu, Zhengyuan, Chen, Nancy F.
Conventional dialogue summarization methods directly generate summaries and do not consider user's specific interests. This poses challenges in cases where the users are more focused on particular topics or aspects. With the advancement of instruction-finetuned language models, we introduce instruction-tuning to dialogues to expand the capability set of dialogue summarization models. To overcome the scarcity of instructive dialogue summarization data, we propose a three-step approach to synthesize high-quality query-based summarization triples. This process involves summary-anchored query generation, query filtering, and query-based summary generation. By training a unified model called InstructDS (Instructive Dialogue Summarization) on three summarization datasets with multi-purpose instructive triples, we expand the capability of dialogue summarization models. We evaluate our method on four datasets, including dialogue summarization and dialogue reading comprehension. Experimental results show that our approach outperforms the state-of-the-art models and even models with larger sizes. Additionally, our model exhibits higher generalizability and faithfulness, as confirmed by human subjective evaluations.
- North America > United States (0.46)
- Asia > China (0.14)
- Asia > Middle East > UAE (0.14)
- (3 more...)
- Education (0.88)
- Consumer Products & Services (0.68)
- Energy > Oil & Gas (0.47)
- (2 more...)
10 databases supporting in-database machine learning
In my October 2022 article, "How to choose a cloud machine learning platform," my first guideline for choosing a platform was, "Be close to your data." Keeping the code near the data is necessary to keep the latency low, since the speed of light limits transmission speeds. After all, machine learning -- especially deep learning -- tends to go through all your data multiple times (each time through is called an epoch). The ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent.
Data-driven Approaches to Surrogate Machine Learning Model Development
Jones, H. Rhys, Mu, Tingting, Popescu, Andrei C., Sulehman, Yusuf
We demonstrate the adaption of three established methods to the field of surrogate machine learning model development. These methods are data augmentation, custom loss functions and transfer learning. Each of these methods have seen widespread use in the field of machine learning, however, here we apply them specifically to surrogate machine learning model development. The machine learning model that forms the basis behind this work was intended to surrogate a traditional engineering model used in the UK nuclear industry. Previous performance of this model has been hampered by poor performance due to limited training data. Here, we demonstrate that through a combination of additional techniques, model performance can be significantly improved. We show that each of the aforementioned techniques have utility in their own right and in combination with one another. However, we see them best applied as part of a transfer learning operation. Five pre-trained surrogate models produced prior to this research were further trained with an augmented dataset and with our custom loss function. Through the combination of all three techniques, we see an improvement of at least $38\%$ in performance across the five models.
Template-based Abstractive Microblog Opinion Summarisation
Bilal, Iman Munire, Wang, Bo, Tsakalidis, Adam, Nguyen, Dong, Procter, Rob, Liakata, Maria
We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarising news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favours extractive summarisation models. To showcase the dataset's utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarisation models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Syria (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (16 more...)