Goto

Collaborating Authors

 portugal


Overview of the 17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

Interactive AI Magazine

IC3K 2025 (17th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management) received 163 paper submissions from 40 countries. To evaluate each submission, a double-blind paper review was performed by the Program Committee. After a stringent selection process, 31 papers were published and presented as full papers, i.e. completed work (12 pages/25' oral presentation), 81 papers were accepted as short papers (54 as oral presentation). The organizing committee included the IC3K Conference Chairs: Ricardo da Silva Torres, Artificial Intelligence Group, Wageningen University & Research, Netherlands and Jorge Bernardino, Polytechnic University of Coimbra, Portugal, and the IC3K 2025 Program Chairs: Le Gruenwald, University of Oklahoma, School of Computer Science, United States, Frans Coenen, University of Liverpool, United Kingdom, Jesualdo Tomás Fernández-Breis, University of Murcia, Spain, Lars Nolle, Jade University of Applied Sciences, Germany, Elio Masciari, University of Napoli Federico II, Italy and David Aveiro, University of Madeira, NOVA-LINCS and ARDITI, Portugal. At the closing session, the conference acknowledged a few papers that were considered excellent in their class, presenting a "Best Paper Award", "Best Student Paper Award", and "Best Poster Award" for each of the co-located conferences.


Reflections on the Reproducibility of Commercial LLM Performance in Empirical Software Engineering Studies

Angermeir, Florian, Amougou, Maximilian, Kreitz, Mark, Bauer, Andreas, Linhuber, Matthias, Fucci, Davide, C., Fabiola Moyón, Mendez, Daniel, Gorschek, Tony

arXiv.org Artificial Intelligence

Large Language Models have gained remarkable interest in industry and academia. The increasing interest in LLMs in academia is also reflected in the number of publications on this topic over the last years. For instance, alone 78 of the around 425 publications at ICSE 2024 performed experiments with LLMs. Conducting empirical studies with LLMs remains challenging and raises questions on how to achieve reproducible results, for both researchers and practitioners. One important step towards excelling in empirical research on LLM and their application is to first understand to what extent current research results are eventually reproducible and what factors may impede reproducibility. This investigation is within the scope of our work. We contribute an analysis of the reproducibility of LLM-centric studies, provide insights into the factors impeding reproducibility, and discuss suggestions on how to improve the current state. In particular, we studied the 85 articles describing LLM-centric studies, published at ICSE 2024 and ASE 2024. Of the 85 articles, 18 provided research artefacts and used OpenAI models. We attempted to replicate those 18 studies. Of the 18 studies, only five were sufficiently complete and executable. For none of the five studies, we were able to fully reproduce the results. Two studies seemed to be partially reproducible, and three studies did not seem to be reproducible. Our results highlight not only the need for stricter research artefact evaluations but also for more robust study designs to ensure the reproducible value of future publications.


Gastronomists study 100 years of menus to reveal food's political power

Popular Science

Health Nutrition Gastronomists study 100 years of menus to reveal food's political power Menus from 457 diplomatic meals served in Portugal reveal how food can make and break alliances. Breakthroughs, discoveries, and DIY tips sent every weekday. A nice, warm meal is one of the great unifiers. Food communicates everything from love and tradition like a home cooked dinner with all of the trimmings and even political stances. At a state dinner, food has the power to cultivate understanding across cultures-or potentially create tensions.


Benchmarking for Practice: Few-Shot Time-Series Crop-Type Classification on the EuroCropsML Dataset

Reuss, Joana, Macdonald, Jan, Becker, Simon, Gikalo, Ekaterina, Schultka, Konrad, Richter, Lorenz, Körner, Marco

arXiv.org Artificial Intelligence

Accurate crop-type classification from satellite time series is essential for agricultural monitoring. While various machine learning algorithms have been developed to enhance performance on data-scarce tasks, their evaluation often lacks real-world scenarios. Consequently, their efficacy in challenging practical applications has not yet been profoundly assessed. To facilitate future research in this domain, we present the first comprehensive benchmark for evaluating supervised and SSL methods for crop-type classification under real-world conditions. This benchmark study relies on the EuroCropsML time-series dataset, which combines farmer-reported crop data with Sentinel-2 satellite observations from Estonia, Latvia, and Portugal. Our findings indicate that MAML-based meta-learning algorithms achieve slightly higher accuracy compared to supervised transfer learning and SSL methods. However, compared to simpler transfer learning, the improvement of meta-learning comes at the cost of increased computational demands and training time. Moreover, supervised methods benefit most when pre-trained and fine-tuned on geographically close regions. In addition, while SSL generally lags behind meta-learning, it demonstrates advantages over training from scratch, particularly in capturing fine-grained features essential for real-world crop-type classification, and also surpasses standard transfer learning. This highlights its practical value when labeled pre-training crop data is scarce. Our insights underscore the trade-offs between accuracy and computational demand in selecting supervised machine learning methods for real-world crop-type classification tasks and highlight the difficulties of knowledge transfer across diverse geographic regions. Furthermore, they demonstrate the practical value of SSL approaches when labeled pre-training crop data is scarce.


Playpen: An Environment for Exploring Learning Through Conversational Interaction

Horst, Nicola, Mazzaccara, Davide, Schmidt, Antonia, Sullivan, Michael, Momentè, Filippo, Franceschetti, Luca, Sadler, Philipp, Hakimov, Sherzod, Testoni, Alberto, Bernardi, Raffaella, Fernández, Raquel, Koller, Alexander, Lemon, Oliver, Schlangen, David, Giulianelli, Mario, Suglia, Alessandro

arXiv.org Artificial Intelligence

Interaction between learner and feedback-giver has come into focus recently for post-training of Large Language Models (LLMs), through the use of reward models that judge the appropriateness of a model's response. In this paper, we investigate whether Dialogue Games -- goal-directed and rule-governed activities driven predominantly by verbal actions -- can also serve as a source of feedback signals for learning. We introduce Playpen, an environment for off- and online learning through Dialogue Game self-play, and investigate a representative set of post-training methods: supervised fine-tuning; direct alignment (DPO); and reinforcement learning with GRPO. We experiment with post-training a small LLM (Llama-3.1-8B-Instruct), evaluating performance on unseen instances of training games as well as unseen games, and on standard benchmarks. We find that imitation learning through SFT improves performance on unseen instances, but negatively impacts other skills, while interactive learning with GRPO shows balanced improvements without loss of skills. We release the framework and the baseline training setups to foster research in the promising new direction of learning in (synthetic) interaction.


Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting

Ortigoso, Ana Rita, Vieira, Gabriel, Fuentes, Daniel, Frazão, Luis, Costa, Nuno, Pereira, António

arXiv.org Artificial Intelligence

This paper presents Project Riley, a novel multimodal and multi-model conversational AI architecture oriented towards the simulation of reasoning influenced by emotional states. Drawing inspiration from Pixar's Inside Out, the system comprises five distinct emotional agents - Joy, Sadness, Fear, Anger, and Disgust - that engage in structured multi-round dialogues to generate, criticise, and iteratively refine responses. A final reasoning mechanism synthesises the contributions of these agents into a coherent output that either reflects the dominant emotion or integrates multiple perspectives. The architecture incorporates both textual and visual large language models (LLMs), alongside advanced reasoning and self-refinement processes. A functional prototype was deployed locally in an offline environment, optimised for emotional expressiveness and computational efficiency. From this initial prototype, another one emerged, called Armando, which was developed for use in emergency contexts, delivering emotionally calibrated and factually accurate information through the integration of Retrieval-Augmented Generation (RAG) and cumulative context tracking. The Project Riley prototype was evaluated through user testing, in which participants interacted with the chatbot and completed a structured questionnaire assessing three dimensions: Emotional Appropriateness, Clarity and Utility, and Naturalness and Human-likeness. The results indicate strong performance in structured scenarios, particularly with respect to emotional alignment and communicative clarity.


The Transformative Power of Inspiration

Communications of the ACM

Growing up as a teenager in the 80's, I witnessed the birth and rise of personal computers firsthand. The Commodore 64 was the first computer to enter our home, and apart from the myriad games we played endlessly, it also made me experiment with BASIC (and basic) programming. Despite my early engagement with computing, at school I was more interested in languages and media (I also wasn't strong enough in maths). So, when it was time to go to university, I chose to study communication sciences at the Faculty of Social Sciences at KU Leuven, Belgium. During my studies, my interest in computers never faded, especially as it coincided with the rise of the Internet and the start of the World Wide Web--an evolution I eagerly followed.


Evaluating AI for Finance: Is AI Credible at Assessing Investment Risk?

Chawla, Divij, Bhutada, Ashita, Anh, Do Duc, Raghunathan, Abhinav, SP, Vinod, Guo, Cathy, Liew, Dar Win, Gupta, Prannaya, Bhardwaj, Rishabh, Bhardwaj, Rajat, Poria, Soujanya

arXiv.org Artificial Intelligence

We assess whether AI systems can credibly evaluate investment risk appetite-a task that must be thoroughly validated before automation. Our analysis was conducted on proprietary systems (GPT, Claude, Gemini) and open-weight models (LLaMA, DeepSeek, Mistral), using carefully curated user profiles that reflect real users with varying attributes such as country and gender. As a result, the models exhibit significant variance in score distributions when user attributes-such as country or gender-that should not influence risk computation are changed. For example, GPT-4o assigns higher risk scores to Nigerian and Indonesian profiles. While some models align closely with expected scores in the Low- and Mid-risk ranges, none maintain consistent scores across regions and demographics, thereby violating AI and finance regulations.


Revealed: What the most stereotypical MEN around the world look like, according to AI - so, do you think they're accurate?

Daily Mail - Science & tech

If you were asked to visualise a stereotypical British man, what would you think of? According to AI, the answer is an overweight man wearing a football shirt. Instagram account @reimagineuk asked AI to create videos of the most stereotypical men around the world - with hilarious results. While the British man looks casual in his football shirt, men from other countries are depicted with fancier outfits. The stereotypical man from Portugal sports a white shirt and a waistcoat, while the man from Nigeria can be seen wearing a bright orange suit.


Presumed Cultural Identity: How Names Shape LLM Responses

Pawar, Siddhesh, Arora, Arnav, Kaffee, Lucie-Aimée, Augenstein, Isabelle

arXiv.org Artificial Intelligence

Names are deeply tied to human identity. They can serve as markers of individuality, cultural heritage, and personal history. However, using names as a core indicator of identity can lead to over-simplification of complex identities. When interacting with LLMs, user names are an important point of information for personalisation. Names can enter chatbot conversations through direct user input (requested by chatbots), as part of task contexts such as CV reviews, or as built-in memory features that store user information for personalisation. We study biases associated with names by measuring cultural presumptions in the responses generated by LLMs when presented with common suggestion-seeking queries, which might involve making assumptions about the user. Our analyses demonstrate strong assumptions about cultural identity associated with names present in LLM generations across multiple cultures. Our work has implications for designing more nuanced personalisation systems that avoid reinforcing stereotypes while maintaining meaningful customisation.