Discourse & Dialogue
ttta: Tools for Temporal Text Analysis
Lange, Kai-Robin, Benner, Niklas, Grönberg, Lars, Hachcham, Aymane, Kolli, Imene, Rieger, Jonas, Jentsch, Carsten
In its current state, the ttta package includes diachronic embeddings, dynamic topic modeling, and document scaling. These tools can be used to track changes in language use, identify emerging topics, and explore how the meaning of words and phrases has evolved over time. Our dynamic topic model approach is based on the model RollingLDA (Rieger et al., 2021), which is a modification of the classic Latent Dirichlet Allocation (Blei et al., 2003), that allows for the estimation of topics over time using a rolling window approach. We additionally implemented the model LDAPrototype (Rieger et al., 2020), serving as a more consistent foundation for RollingLDA than a common LDA. With these models, users can uncover and analyze topics of discussion in temporal data sets and track even rapid changes, which other dynamic topic models struggle with. This ability to track rapid changes in topics is further used in the Topical Changes model put forth by Rieger et al. (2022) and Lange et al. (2022) that identifies change points in the word topic distribution of RollingLDA. Figure 1 visualizes the changes observed by the Topical Changes model in speeches from the German Bundestag (Lange & Jentsch, 2023), which can be analyzed further using leave-one-out word impacts provided by the model or, as Lange et al. (2025) proposed, by asking Large Language Models to interpret the change and relate it to a possible narrative shift.
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Arora, Siddhant, Lu, Zhiyun, Chiu, Chung-Cheng, Pang, Ruoming, Watanabe, Shinji
The recent wave of audio foundation models (FMs) could provide new capabilities for conversational modeling. However, there have been limited efforts to evaluate these audio FMs comprehensively on their ability to have natural and interactive conversations. To engage in meaningful conversation with the end user, we would want the FMs to additionally perform a fluent succession of turns without too much overlapping speech or long stretches of silence. Inspired by this, we ask whether the recently proposed audio FMs can understand, predict, and perform turn-taking events? To answer this, we propose a novel evaluation protocol that can assess spoken dialog system's turn-taking capabilities using a supervised model as a judge that has been trained to predict turn-taking events in human-human conversations. Using this protocol, we present the first comprehensive user study that evaluates existing spoken dialogue systems on their ability to perform turn-taking events and reveal many interesting insights, such as they sometimes do not understand when to speak up, can interrupt too aggressively and rarely backchannel. We further evaluate multiple open-source and proprietary audio FMs accessible through APIs on carefully curated test benchmarks from Switchboard to measure their ability to understand and predict turn-taking events and identify significant room for improvement. We will open source our evaluation platform to promote the development of advanced conversational AI systems.
Market-Derived Financial Sentiment Analysis: Context-Aware Language Models for Crypto Forecasting
Moradi-Kamali, Hamid, Rajabi-Ghozlou, Mohammad-Hossein, Ghazavi, Mahdi, Soltani, Ali, Sattarzadeh, Amirreza, Entezari-Maleki, Reza
Financial Sentiment Analysis (FSA) traditionally relies on human-annotated sentiment labels to infer investor sentiment and forecast market movements. However, inferring the potential market impact of words based on their human-perceived intentions is inherently challenging. We hypothesize that the historical market reactions to words, offer a more reliable indicator of their potential impact on markets than subjective sentiment interpretations by human annotators. To test this hypothesis, a market-derived labeling approach is proposed to assign tweet labels based on ensuing short-term price trends, enabling the language model to capture the relationship between textual signals and market dynamics directly. A domain-specific language model was fine-tuned on these labels, achieving up to an 11% improvement in short-term trend prediction accuracy over traditional sentiment-based benchmarks. Moreover, by incorporating market and temporal context through prompt-tuning, the proposed context-aware language model demonstrated an accuracy of 89.6% on a curated dataset of 227 impactful Bitcoin-related news events with significant market impacts. Aggregating daily tweet predictions into trading signals, our method outperformed traditional fusion models (which combine sentiment-based and price-based predictions). It challenged the assumption that sentiment-based signals are inferior to price-based predictions in forecasting market movements. Backtesting these signals across three distinct market regimes yielded robust Sharpe ratios of up to 5.07 in trending markets and 3.73 in neutral markets. Our findings demonstrate that language models can serve as effective short-term market predictors. This paradigm shift underscores the untapped capabilities of language models in financial decision-making and opens new avenues for market prediction applications.
Fine-tuning BERT with Bidirectional LSTM for Fine-grained Movie Reviews Sentiment Analysis
Nkhata, Gibson, Gauch, Susan, Anjum, Usman, Zhan, Justin
Sentiment Analysis (SA) is instrumental in understanding peoples viewpoints facilitating social media monitoring recognizing products and brands and gauging customer satisfaction. Consequently SA has evolved into an active research domain within Natural Language Processing (NLP). Many approaches outlined in the literature devise intricate frameworks aimed at achieving high accuracy, focusing exclusively on either binary sentiment classification or fine-grained sentiment classification. In this paper our objective is to fine-tune the pre-trained BERT model with Bidirectional LSTM (BiLSTM) to enhance both binary and fine-grained SA specifically for movie reviews. Our approach involves conducting sentiment classification for each review followed by computing the overall sentiment polarity across all reviews. We present our findings on binary classification as well as fine-grained classification utilizing benchmark datasets. Additionally we implement and assess two accuracy improvement techniques Synthetic Minority Oversampling Technique (SMOTE) and NLP Augmenter (NLPAUG) to bolster the models generalization in fine-grained sentiment classification. Finally a heuristic algorithm is employed to calculate the overall polarity of predicted reviews from the BERT+BiLSTM output vector. Our approach performs comparably with state-of-the-art (SOTA) techniques in both classifications. For instance in binary classification we achieve 97.67% accuracy surpassing the leading SOTA model NB-weighted-BON+dv-cosine by 0.27% on the renowned IMDb dataset. Conversely for five-class classification on SST-5 while the top SOTA model RoBERTa+large+Self-explaining attains 55.5% accuracy our model achieves 59.48% accuracy surpassing the BERT-large baseline by 3.6%.
I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue
Ghaleb, Esam, Khaertdinov, Bulat, Özyürek, Aslı, Fernández, Raquel
In face-to-face interaction, we use multiple modalities, including speech and gestures, to communicate information and resolve references to objects. However, how representational co-speech gestures refer to objects remains understudied from a computational perspective. In this work, we address this gap by introducing a multimodal reference resolution task centred on representational gestures, while simultaneously tackling the challenge of learning robust gesture embeddings. We propose a self-supervised pre-training approach to gesture representation learning that grounds body movements in spoken language. Our experiments show that the learned embeddings align with expert annotations and have significant predictive power. Moreover, reference resolution accuracy further improves when (1) using multimodal gesture representations, even when speech is unavailable at inference time, and (2) leveraging dialogue history. Overall, our findings highlight the complementary roles of gesture and speech in reference resolution, offering a step towards more naturalistic models of human-machine interaction.
Polish-ASTE: Aspect-Sentiment Triplet Extraction Datasets for Polish
Lango, Marta, Naglik, Borys, Lango, Mateusz, Naglik, Iwo
Aspect-Sentiment Triplet Extraction (ASTE) is one of the most challenging and complex tasks in sentiment analysis. It concerns the construction of triplets that contain an aspect, its associated sentiment polarity, and an opinion phrase that serves as a rationale for the assigned polarity. Despite the growing popularity of the task and the many machine learning methods being proposed to address it, the number of datasets for ASTE is very limited. In particular, no dataset is available for any of the Slavic languages. In this paper, we present two new datasets for ASTE containing customer opinions about hotels and purchased products expressed in Polish. We also perform experiments with two ASTE techniques combined with two large language models for Polish to investigate their performance and the difficulty of the assembled datasets. The new datasets are available under a permissive licence and have the same file format as the English datasets, facilitating their use in future research.
Sentiment Analysis of Movie Reviews Using BERT
Nkhata, Gibson, Anjum, Usman, Zhan, Justin
Sentiment Analysis (SA) or opinion mining is analysis of emotions and opinions from any kind of text. SA helps in tracking peoples viewpoints and it is an important factor when it comes to social media monitoring product and brand recognition customer satisfaction customer loyalty advertising and promotions success and product acceptance. That is why SA is one of the active research areas in Natural Language Processing (NLP). SA is applied on data sourced from various media platforms to mine sentiment knowledge from them. Various approaches have been deployed in the literature to solve the problem. Most techniques devise complex and sophisticated frameworks in order to attain optimal accuracy. This work aims to finetune Bidirectional Encoder Representations from Transformers (BERT) with Bidirectional Long Short-Term Memory (BiLSTM) for movie reviews sentiment analysis and still provide better accuracy than the State-of-The-Art (SOTA) methods. The paper also shows how sentiment analysis can be applied if someone wants to recommend a certain movie for example by computing overall polarity of its sentiments predicted by the model. That is our proposed method serves as an upper-bound baseline in prediction of a predominant reaction to a movie. To compute overall polarity a heuristic algorithm is applied to BERTBiLSTM output vector. Our model can be extended to three-class four-class or any fine-grained classification and apply overall polarity computation again. This is intended to be exploited in future work.
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems
Zhang, Hao, Li, Weiwei, Chen, Rilin, Kothapally, Vinay, Yu, Meng, Yu, Dong
Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD predicts four control tokens to regulate turn-switching and turn-keeping, distinguishing between intentional and unintentional barge-ins while detecting query completion for handling user pauses and hesitations. By processing input speech in short intervals, the semantic VAD enables real-time decision-making, while the core dialogue engine (CDE) is only activated for response generation, reducing computational overhead. This design allows independent DM optimization without retraining the CDE, balancing interaction accuracy and inference efficiency for scalable, next-generation full-duplex SDS.
SentiFormer: Metadata Enhanced Transformer for Image Sentiment Analysis
Feng, Bin, Ruan, Shulan, Yang, Mingzheng, Han, Dongxuan, Liu, Huijie, Zhang, Kai, Liu, Qi
As more and more internet users post images online to express their daily emotions, image sentiment analysis has attracted increasing attention. Recently, researchers generally tend to design different neural networks to extract visual features from images for sentiment analysis. Despite the significant progress, metadata, the data (e.g., text descriptions and keyword tags) for describing the image, has not been sufficiently explored in this task. In this paper, we propose a novel Metadata Enhanced Transformer for sentiment analysis (SentiFormer) to fuse multiple metadata and the corresponding image into a unified framework. Specifically, we first obtain multiple metadata of the image and unify the representations of diverse data. To adaptively learn the appropriate weights for each metadata, we then design an adaptive relevance learning module to highlight more effective information while suppressing weaker ones. Moreover, we further develop a cross-modal fusion module to fuse the adaptively learned representations and make the final prediction. Extensive experiments on three publicly available datasets demonstrate the superiority and rationality of our proposed method.
Stories that (are) Move(d by) Markets: A Causal Exploration of Market Shocks and Semantic Shifts across Different Partisan Groups
Drinkall, Felix, Zohren, Stefan, McMahon, Michael, Pierrehumbert, Janet B.
Macroeconomic fluctuations and the narratives that shape them form a mutually reinforcing cycle: public discourse can spur behavioural changes leading to economic shifts, which then result in changes in the stories that propagate. We show that shifts in semantic embedding space can be causally linked to financial market shocks -- deviations from the expected market behaviour. Furthermore, we show how partisanship can influence the predictive power of text for market fluctuations and shape reactions to those same shocks. We also provide some evidence that text-based signals are particularly salient during unexpected events such as COVID-19, highlighting the value of language data as an exogenous variable in economic forecasting. Our findings underscore the bidirectional relationship between news outlets and market shocks, offering a novel empirical approach to studying their effect on each other.