AITopics | Bražinskas, Arthur

Collaborating Authors

Bražinskas, Arthur

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Small Language Models Improve Giants by Rewriting Their Outputs

Vernikos, Giorgos, Bražinskas, Arthur, Adamek, Jakub, Mallinson, Jonathan, Severyn, Aliaksei, Malmi, Eric

arXiv.org Artificial IntelligenceMay-22-2023

Large language models (LLMs) have demonstrated impressive few-shot learning capabilities, but they often underperform compared to fine-tuned models on challenging tasks. Furthermore, their large size and restricted access only through APIs make task-specific fine-tuning impractical. Moreover, LLMs are sensitive to different aspects of prompts (e.g., the selection and order of demonstrations) and can thus require time-consuming prompt engineering. In this light, we propose a method to correct LLM outputs without relying on their weights. First, we generate a pool of candidates by few-shot prompting an LLM. Second, we refine the LLM-generated outputs using a smaller model, the LM-corrector (LMCor), which is trained to rank, combine and rewrite the candidates to produce the final target output. Our experiments demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B) across diverse tasks. Moreover, we illustrate that the LMCor exhibits robustness against different prompts, thereby minimizing the need for extensive prompt engineering. Finally, we showcase that the LMCor can be seamlessly integrated with different LLMs at inference time, serving as a plug-and-play module to improve their performance.

corrector, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.13514

Genre:

Research Report (0.82)
Personal > Honors (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual Conditional Generation with Interaction

Pilault, Jonathan, Garcia, Xavier, Bražinskas, Arthur, Firat, Orhan

arXiv.org Artificial IntelligenceJan-24-2023

Crosslingual conditional generation (e.g., machine translation) has long enjoyed the benefits of scaling. Nonetheless, there are still issues that scale alone may not overcome. A source query in one language, for instance, may yield several translation options in another language without any extra context. Only one translation could be acceptable however, depending on the translator's preferences and goals. Choosing the incorrect option might significantly affect translation usefulness and quality. We propose a novel method interactive-chain prompting -- a series of question, answering and generation intermediate steps between a Translator model and a User model -- that reduces translations into a list of subproblems addressing ambiguities and then resolving such subproblems before producing the final text to be translated. To check ambiguity resolution capabilities and evaluate translation quality, we create a dataset exhibiting different linguistic phenomena which leads to ambiguities at inference for four languages. To encourage further exploration in this direction, we release all datasets. We note that interactive-chain prompting, using eight interactions as exemplars, consistently surpasses prompt-based methods with direct access to background information to resolve ambiguities.

artificial intelligence, machine translation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2301.10309

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Learning Opinion Summarizers by Selecting Informative Reviews

Bražinskas, Arthur, Lapata, Mirella, Titov, Ivan

arXiv.org Artificial IntelligenceSep-9-2021

Opinion summarization has been traditionally approached with unsupervised, weakly-supervised and few-shot learning techniques. In this work, we collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training. However, the number of reviews per product is large (320 on average), making summarization - and especially training a summarizer - impractical. Moreover, the content of many reviews is not reflected in the human-written summaries, and, thus, the summarizer trained on random review subsets hallucinates. In order to deal with both of these challenges, we formulate the task as jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets. The choice of the review subset is treated as a latent variable, predicted by a small and simple selector. The subset is then fed into a more powerful summarizer. For joint training, we use amortized variational inference and policy gradient methods. Our experiments demonstrate the importance of selecting informative reviews resulting in improved quality of summaries and reduced hallucinations.

computational linguistics, neural network, social media, (20 more...)

arXiv.org Artificial Intelligence

2109.04325

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.64)
Overview (0.46)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Transductive Learning for Abstractive News Summarization

Bražinskas, Arthur, Liu, Mengwen, Nallapati, Ramesh, Ravi, Sujith, Dreyer, Markus

arXiv.org Artificial IntelligenceApr-17-2021

Pre-trained language models have recently advanced abstractive summarization. These models are further fine-tuned on human-written references before summary generation in test time. In this work, we propose the first application of transductive learning to summarization. In this paradigm, a model can learn from the test set's input before inference. To perform transduction, we propose to utilize input document summarizing sentences to construct references for learning in test time. These sentences are often compressed and fused to form abstractive summaries and provide omitted details and additional context to the reader. We show that our approach yields state-of-the-art results on CNN/DM and NYT datasets. For instance, we achieve over 1 ROUGE-L point improvement on CNN/DM. Further, we show the benefits of transduction from older to more recent news. Finally, through human and automatic evaluation, we show that our summaries become more abstractive and coherent.

artificial intelligence, summarization, text processing, (17 more...)

arXiv.org Artificial Intelligence

2104.095

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Energy > Power Industry > Utilities (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Few-Shot Learning for Opinion Summarization

Bražinskas, Arthur, Lapata, Mirella, Titov, Ivan

arXiv.org Machine LearningOct-10-2020

Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents, such as user reviews of a product. The task is practically important and has attracted a lot of attention. However, due to the high cost of summary production, datasets large enough for training supervised models are lacking. Instead, the task has been traditionally approached with extractive methods that learn to select text fragments in an unsupervised or weakly-supervised way. Recently, it has been shown that abstractive summaries, potentially more fluent and better at reflecting conflicting information, can also be produced in an unsupervised fashion. However, these models, not being exposed to actual summaries, fail to capture their essential properties. In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text with all expected properties, such as writing style, informativeness, fluency, and sentiment preservation. We start by training a conditional Transformer language model to generate a new product review given other available reviews of the product. The model is also conditioned on review properties that are directly related to summaries; the properties are derived from reviews with no manual effort. In the second stage, we fine-tune a plug-in module that learns to predict property values on a handful of summaries. This lets us switch the generator to the summarization mode. We show on Amazon and Yelp datasets that our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.

neural network, social media, summarization, (21 more...)

arXiv.org Machine Learning

2004.14884

Country:

Europe (0.46)
North America > Canada > Alberta (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Embedding Words as Distributions with a Bayesian Skip-gram Model

Bražinskas, Arthur, Havrylov, Serhii, Titov, Ivan

arXiv.org Artificial IntelligenceJun-10-2018

We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential 'meanings'. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.

bayesian inference, neural network, proceedings, (20 more...)

arXiv.org Artificial Intelligence

1711.11027

Country: Europe (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback