AITopics | Domingues, Pedro Henrique

Collaborating Authors

Domingues, Pedro Henrique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sentence-level Aggregation of Lexical Metrics Correlate Stronger with Human Judgements than Corpus-level Aggregation

Cavalin, Paulo, Domingues, Pedro Henrique, Pinhanez, Claudio

arXiv.org Artificial IntelligenceJul-3-2024

In this paper we show that corpus-level aggregation hinders considerably the capability of lexical metrics to accurately evaluate machine translation (MT) systems. With empirical experiments we demonstrate that averaging individual segment-level scores can make metrics such as BLEU and chrF correlate much stronger with human judgements and make them behave considerably more similar to neural metrics such as COMET and BLEURT. We show that this difference exists because corpus- and segment-level aggregation differs considerably owing to the classical average of ratio versus ratio of averages Mathematical problem. Moreover, as we also show, such difference affects considerably the statistical robustness of corpus-level aggregation. Considering that neural metrics currently only cover a small set of sufficiently-resourced languages, the results in this paper can help make the evaluation of MT systems for low-resource languages more trustworthy.

artificial intelligence, correlation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.12832

Country:

North America > United States > Pennsylvania (0.14)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

PeLLE: Encoder-based language models for Brazilian Portuguese based on open data

de Mello, Guilherme Lamartine, Finger, Marcelo, Serras, and Felipe, Carpi, Miguel de Mello, Jose, Marcos Menon, Domingues, Pedro Henrique, Cavalim, Paulo

arXiv.org Artificial IntelligenceFeb-29-2024

In this paper we present PeLLE, a family of large language models based on the RoBERTa architecture, for Brazilian Portuguese, trained on curated, open data from the Carolina corpus. Aiming at reproducible results, we describe details of the pretraining of the models. We also evaluate PeLLE models against a set of existing multilingual and PT-BR refined pretrained Transformer-based LLM encoders, contrasting performance of large versus smaller-but-curated pretrained models in several downstream tasks. We conclude that several tasks perform better with larger models, but some tasks benefit from smaller-but-curated data in its pretraining.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.19204

Country:

Europe > France (0.14)
Europe > Portugal (0.14)
South America > Brazil (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.40)

Industry: Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback