AITopics | Martins, Bruno

Plotting

Martins, Bruno

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explainable ICD Coding via Entity Linking

Barreiros, Leonor, Coutinho, Isabel, Correia, Gonçalo M., Martins, Bruno

arXiv.org Artificial IntelligenceApr-7-2025

Clinical coding is a critical task in healthcare, although traditional methods for automating clinical coding may not provide sufficient explicit evidence for coders in production environments. This evidence is crucial, as medical coders have to make sure there exists at least one explicit passage in the input health record that justifies the attribution of a code. We therefore propose to reframe the task as an entity linking problem, in which each document is annotated with its set of codes and respective textual evidence, enabling better human-machine collaboration. By leveraging parameter-efficient fine-tuning of Large Language Models (LLMs), together with constrained decoding, we introduce three approaches to solve this problem that prove effective at disambiguating clinical mentions and that perform well in few-shot scenarios.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.20508

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM

Ambilduke, Kshitij, Peters, Ben, Sannigrahi, Sonal, Keshwani, Anil, Lam, Tsz Kin, Martins, Bruno, Boito, Marcely Zanon, Martins, André F. T.

arXiv.org Artificial IntelligenceMar-13-2025

Large language models (LLMs) have shown remarkable performance and generalization capabilities across multiple languages and tasks, making them very attractive targets for multi-modality integration (e.g., images or speech). In this work, we extend an existing LLM to the speech modality via speech discretization and continued pre-training. In particular, we are interested in multilingual LLMs, such as TOWER, as their pre-training setting allows us to treat discretized speech input as an additional translation language. The resulting open-source model, SPIRE, is able to transcribe and translate English speech input while maintaining TOWER's original performance on translation-related tasks, showcasing that discretized speech input integration as an additional language is feasible during LLM adaptation. We make our code and models available to the community.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.1062

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation of Multilingual Image Captioning: How far can we get with CLIP models?

Gomes, Gonçalo, Zerva, Chrysoula, Martins, Bruno

arXiv.org Artificial IntelligenceFeb-17-2025

The evaluation of image captions, looking at both linguistic fluency and semantic correspondence to visual contents, has witnessed a significant effort. Still, despite advancements such as the CLIPScore metric, multilingual captioning evaluation has remained relatively unexplored. This work presents several strategies, and extensive experiments, related to evaluating CLIPScore variants in multilingual settings. To address the lack of multilingual test data, we consider two different strategies: (1) using quality aware machine-translated datasets with human judgements, and (2) re-purposing multilingual datasets that target semantic inference and reasoning. Our results highlight the potential of finetuned multilingual models to generalize across languages and to handle complex linguistic challenges. Tests with machine-translated data show that multilingual CLIPScore models can maintain a high correlation with human judgements across different languages, and additional tests with natively multilingual and multicultural data further attest to the high-quality assessments.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.066

Country: Europe (1.00)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Efficient Architectures for High Resolution Vision-Language Models

Carvalho, Miguel, Martins, Bruno

arXiv.org Artificial IntelligenceJan-5-2025

Vision-Language Models (VLMs) have recently experienced significant advancements. However, challenges persist in the accurate recognition of fine details within high resolution images, which limits performance in multiple tasks. This work introduces Pheye, a novel architecture that efficiently processes high-resolution images while training fewer parameters than similarly sized VLMs. Notably, Pheye achieves a high efficiency while maintaining strong performance, particularly in tasks that demand fine-grained image understanding and/or the handling of scene-text.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.02584

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Evaluation of Code LLMs on Geospatial Code Generation

Gramacki, Piotr, Martins, Bruno, Szymański, Piotr

arXiv.org Artificial IntelligenceDec-13-2024

Software development support tools have been studied for a long time, with recent approaches using Large Language Models (LLMs) for code generation. These models can generate Python code for data science and machine learning applications. LLMs are helpful for software engineers because they increase productivity in daily work. An LLM can also serve as a "mentor" for inexperienced software developers, and be a viable learning support. High-quality code generation with LLMs can also be beneficial in geospatial data science. However, this domain poses different challenges, and code generation LLMs are typically not evaluated on geospatial tasks. Here, we show how we constructed an evaluation benchmark for code generation models, based on a selection of geospatial tasks. We categorised geospatial tasks based on their complexity and required tools. Then, we created a dataset with tasks that test model capabilities in spatial reasoning, spatial data processing, and geospatial tools usage. The dataset consists of specific coding problems that were manually created for high quality. For every problem, we proposed a set of test scenarios that make it possible to automatically check the generated code for correctness. In addition, we tested a selection of existing code generation LLMs for code generation in the geospatial domain. We share our dataset and reproducible evaluation code on a public GitHub repository, arguing that this can serve as an evaluation benchmark for new LLMs in the future. Our dataset will hopefully contribute to the development new models capable of solving geospatial coding tasks with high accuracy. These models will enable the creation of coding assistants tailored for geospatial applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3687123.3698286

2410.04617

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Software (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

Coelho, João, Martins, Bruno, Magalhães, João, Callan, Jamie, Xiong, Chenyan

arXiv.org Artificial IntelligenceApr-5-2024

This study investigates the existence of positional biases in Transformer-based models for text representation learning, particularly in the context of web document retrieval. We build on previous research that demonstrated loss of information in the middle of input sequences for causal language models, extending it to the domain of representation learning. We examine positional biases at various stages of training for an encoder-decoder model, including language model pre-training, contrastive pre-training, and contrastive fine-tuning. Experiments with the MS-MARCO document collection reveal that after contrastive pre-training the model already generates embeddings that better capture early contents of the input, with fine-tuning further aggravating this effect.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.04163

Country:

Europe > Portugal (0.14)
North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings

Gomes, Gonçalo, Coutinho, Isabel, Martins, Bruno

arXiv.org Artificial IntelligenceFeb-5-2024

Although the International Classification of Diseases (ICD) has been adopted worldwide, manually assigning ICD codes to clinical text is time-consuming, error-prone, and expensive, motivating the development of automated approaches. This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work. We specifically employ a strong Transformer-based model as a text encoder and, to handle lengthy clinical narratives, we explored either (a) adapting the base encoder model into a Longformer, or (b) dividing the text into chunks and processing each chunk independently. The representations produced by the encoder are combined with a label embedding mechanism that explores diverse ICD code synonyms. Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding, with the label embeddings significantly contributing to the good performance. Our approach also leads to properly calibrated classification results, which can effectively inform downstream tasks such as quantification.

icd code, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.03172

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Towards a Fully Unsupervised Framework for Intent Induction in Customer Support Dialogues

Costa, Rita, Martins, Bruno, Viana, Sérgio, Coheur, Luisa

arXiv.org Artificial IntelligenceJul-28-2023

The evolution of technology has allowed the automation of several processes across diversified engineering industry fields, such as customer support services, which have drastically evolved with the advances in Natural Language Processing and Machine Learning. One of the major challenges of these systems is to identify users intentions, a complex Natural Language Understanding task, that vary across domains. With the evolution of Deep Learning architectures, recent works focused on modelling intentions and creating a taxonomy of intents, so they can be fed to powerful supervised clustering algorithms (Haponchyk et al., 2020; Chatterjee and Sengupta, 2021). However, these systems have the bottleneck of requiring the existence of labelled data to be trained and deployed, and, thus, they can not be easily transferred to real world customer support services, where the available data for a commercial chatbot usually consists in no more than a dataset of interactions between clients and operators. As labeling hundreds of utterances with intent labels can be time-consuming, laborious, expensive and, sometimes, even requires someone with expertise, it is not straightforward to apply current state of the art supervised models to new domains (Chatterjee and Sengupta, 2020).

machine learning, natural language, utterance, (19 more...)

arXiv.org Artificial Intelligence

2307.1541

Country: Europe (0.28)

Genre:

Overview (0.66)
Research Report (0.50)

Industry: Consumer Products & Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents

Yueh, Chieling, Kanoulas, Evangelos, Martins, Bruno, Thorne, Camilo, Akhondi, Saber

arXiv.org Artificial IntelligenceJun-23-2023

The high volume of published chemical patents and the importance of a timely acquisition of their information gives rise to automating information extraction from chemical patents. Anaphora resolution is an important component of comprehensive information extraction, and is critical for extracting reactions. In chemical patents, there are five anaphoric relations of interest: co-reference, transformed, reaction associated, work up, and contained. Our goal is to investigate how the performance of anaphora resolution models for reaction texts in chemical patents differs in a noise-free and noisy environment and to what extent we can improve the robustness against noise of the model.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.13379

Country:

Europe (0.47)
North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.83)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.69)
Information Technology > Data Science > Data Mining > Text Mining (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Ramos, Rita, Martins, Bruno, Elliott, Desmond

arXiv.org Artificial IntelligenceMay-31-2023

Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process. Without requiring any multilingual caption data, we propose LMCap, an image-blind few-shot multilingual captioning model that works by prompting a language model with retrieved captions. Specifically, instead of following the standard encoder-decoder paradigm, given an image, LMCap first retrieves the captions of similar images using a multilingual CLIP encoder. These captions are then combined into a prompt for an XGLM decoder, in order to generate captions in the desired language. In other words, the generation model does not directly process the image, instead processing retrieved captions. Experiments on the XM3600 dataset of geographically diverse images show that our model is competitive with fully-supervised multilingual captioning models, without requiring any supervised training on any captioning data.

caption, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.19821

Country: Europe (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback