Not enough data to create a plot.
Try a different view from the menu above.
Terragni, Silvia
Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements
Terragni, Silvia, Cuong, Hoang, Daiber, Joachim, Gudipati, Pallavi, Mendes, Pablo N.
Large Language Models (LLMs) have demonstrated potential as effective search relevance evaluators. However, there is a lack of comprehensive guidance on which models consistently perform optimally across various contexts or within specific use cases. In this paper, we assess several LLMs and Multimodal Language Models (MLLMs) in terms of their alignment with human judgments across multiple multimodal search scenarios. Our analysis investigates the trade-offs between cost and accuracy, highlighting that model performance varies significantly depending on the context. Interestingly, in smaller models, the inclusion of a visual component may hinder performance rather than enhance it. These findings highlight the complexities involved in selecting the most appropriate model for practical applications.
Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems
Sekulić, Ivan, Terragni, Silvia, Guimarães, Victor, Khau, Nghia, Guedes, Bruna, Filipavicius, Modestas, Manso, André Ferreira, Mathis, Roland
In this paper, we introduce DAUS, a generative The field of dialogue systems has seen a notable user simulator for TOD systems. As depicted in surge in the utilization of user simulation approaches, Figure 1, once initialized with the user goal description, primarily for the evaluation and enhancement DAUS engages with the system across of conversational search systems (Owoicho multiple turns, providing information to fulfill the et al., 2023) and task-oriented dialogue (TOD) systems user's objectives. Our aim is to minimize the commonly (Terragni et al., 2023). User simulation plays observed user simulator hallucinations and a pivotal role in replicating the nuanced interactions incorrect responses (right-hand side of Figure 1), of real users with these systems, enabling a with an ultimate objective of enabling detection wide range of applications such as synthetic data of common errors in TOD systems (left-hand side augmentation, error detection, and evaluation (Wan of Figure 1). Our approach is straightforward yet et al., 2022; Sekulić et al., 2022; Li et al., 2022; effective: we build upon the foundation of LLMbased Balog and Zhai, 2023; Ji et al., 2022).
In-Context Learning User Simulators for Task-Oriented Dialog Systems
Terragni, Silvia, Filipavicius, Modestas, Khau, Nghia, Guedes, Bruna, Manso, André, Mathis, Roland
This paper presents a novel application of large language models in user simulation for task-oriented dialog systems, specifically focusing on an in-context learning approach. By harnessing the power of these models, the proposed approach generates diverse utterances based on user goals and limited dialog examples. Unlike traditional simulators, this method eliminates the need for labor-intensive rule definition or extensive annotated data, making it more efficient and accessible. Additionally, an error analysis of the interaction between the user simulator and dialog system uncovers common mistakes, providing valuable insights into areas that require improvement. Our implementation is available at https://github.com/telepathylabsai/prompt-based-user-simulator.
Contrastive language and vision learning of general fashion concepts
Chia, Patrick John, Attanasio, Giuseppe, Bianchi, Federico, Terragni, Silvia, Magalhães, Ana Rita, Goncalves, Diogo, Greco, Ciro, Tagliabue, Jacopo
The model is trained on over 700k The extraordinary growth of online retail - as < image, text > pairs from the inventory of of 2020, 4 trillion dollars per year (Cramer-Flood, Farfetch, one of the largest fashion luxury retailer 2020) - had a profound impact on the fashion industry, in the world, and is applied to use cases with 1 out of 4 transactions now happening online known to be crucial in a vast global market; (McKinsey, 2019). The combination of large amounts of data and variety of use cases supported 2. we evaluate FashionCLIP in a variety of by growing investments has made e-commerce fertile tasks, showing that fine-tuning helps capture for the application of cutting-edge machine domain-specific concepts and generalize them learning models, with NLP involved in recommendations in zero-shot scenarios; we supplement quantitative (de Souza Pereira Moreira et al., 2019; Guo tests with qualitative analyses, and et al., 2020; Goncalves et al., 2021), information offer preliminary insights of how concepts retrieval (IR) (Ai and Narayanan.R, 2021), product grounded in a visual space unlock linguistic