Tuzla Canton
Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm
Bojic, Ljubisa, Zagovora, Olga, Zelenkauskaite, Asta, Vukovic, Vuk, Cabarkapa, Milan, Jerkovic, Selma Veseljević, Jovančevic, Ana
In the era of rapid digital communication, vast amounts of textual data are generated daily, demanding efficient methods for latent content analysis to extract meaningful insights. Large Language Models (LLMs) offer potential for automating this process, yet comprehensive assessments comparing their performance to human annotators across multiple dimensions are lacking. This study evaluates the reliability, consistency, and quality of seven state-of-the-art LLMs, including variants of OpenAI's GPT-4, Gemini, Llama, and Mixtral, relative to human annotators in analyzing sentiment, political leaning, emotional intensity, and sarcasm detection. A total of 33 human annotators and eight LLM variants assessed 100 curated textual items, generating 3,300 human and 19,200 LLM annotations, with LLMs evaluated across three time points to examine temporal consistency. Inter-rater reliability was measured using Krippendorff's alpha, and intra-class correlation coefficients assessed consistency over time. The results reveal that both humans and LLMs exhibit high reliability in sentiment analysis and political leaning assessments, with LLMs demonstrating higher internal consistency than humans. In emotional intensity, LLMs displayed higher agreement compared to humans, though humans rated emotional intensity significantly higher. Both groups struggled with sarcasm detection, evidenced by low agreement. LLMs showed excellent temporal consistency across all dimensions, indicating stable performance over time. This research concludes that LLMs, especially GPT-4, can effectively replicate human analysis in sentiment and political leaning, although human expertise remains essential for emotional intensity interpretation. The findings demonstrate the potential of LLMs for consistent and high-quality performance in certain areas of latent content analysis.
Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments
Cheng, Sitao, Zhuang, Ziyuan, Xu, Yong, Yang, Fangkai, Zhang, Chaoyun, Qin, Xiaoting, Huang, Xiang, Chen, Ling, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi
Large Language Models (LLMs) have shown potential in reasoning over structured environments, e.g., knowledge graph and table. Such tasks typically require multi-hop reasoning, i.e., match natural language utterance with instances in the environment. Previous methods leverage LLMs to incrementally build a reasoning path, where the LLMs either invoke tools or pick up schemas by step-by-step interacting with the environment. We propose Reasoning-Path-Editing (Readi), a novel framework where LLMs can efficiently and faithfully reason over structured environments. In Readi, LLMs initially generate a reasoning path given a query, and edit the path only when necessary. We instantiate the path on structured environments and provide feedback to edit the path if anything goes wrong. Experimental results on three KGQA and two TableQA datasets show the effectiveness of Readi, significantly surpassing previous LLM-based methods (by 9.1% Hit@1 on WebQSP, 12.4% on MQA-3H and 9.5% on WTQ), comparable with state-of-the-art fine-tuned methods (67% on CWQ and 74.7% on WebQSP) and substantially boosting the vanilla LLMs (by 14.9% on CWQ). Our code will be available on https://aka.ms/readi.
A Long-Short-Term Mixed-Integer Formulation for Highway Lane Change Planning
Reiter, Rudolf, Nurkanovic, Armin, Bernadini, Daniele, Diehl, Moritz, Bemporad, Alberto
Abstract--This work considers the problem of optimal lane changing in a structured multi-agent road environment. The long-term decision variables account for selecting gaps between SVs on each lane. These lane transitions are used for I. N recent years many approaches have been proposed for vehicle motion planning in structured multi-lane road transition gaps on consecutive lanes are modeled by disjunctive environments. LTF are formulated consistently, i.e., a transition point constrains In fact, even deterministic two-dimensional motion planning the point-mass model trajectory to the corresponding problems with rectangular obstacles are NP-hard [1], [2]. Contrary to strict hierarchical decomposition, the coarser This work proposes a novel iterative planning algorithm, approximation of the high-level plan cannot be infeasible for referred to as long-short-term motion planner (LSTMP) that the low-level planner. The STF aims at optimizing a fourstate Within the formulation of the LTF, the locations of transitions discrete-time trajectory of a point-mass model including in time and position are continuous.
Analyzing An After-Sales Service Process Using Object-Centric Process Mining: A Case Study
Park, Gyunam, Aydin, Sevde, Ugur, Cuneyt, van der Aalst, Wil M. P.
Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexplored benefits in actual operational scenarios. Through an in-depth case study of Borusan Cat's after-sales service process, this study emphasizes the capability of object-centric process mining to capture entangled business process details. Utilizing an event log of approximately 65,000 events, our analysis underscores the importance of embracing this paradigm for richer business insights and enhanced operational improvements.
Geographic Adaptation of Pretrained Language Models
Hofmann, Valentin, Glavaš, Goran, Ljubešić, Nikola, Pierrehumbert, Janet B., Schütze, Hinrich
Geographic features are commonly used to improve the performance of pretrained language models (PLMs) on NLP tasks where they are intuitively beneficial (e.g., geolocation prediction, dialect feature prediction). Existing methods, however, leverage geographic information in task-specific fine-tuning and fail to integrate it into the geo-linguistic knowledge encoded by PLMs, which would make it transferable across different tasks. In this paper, we introduce an approach to task-agnostic geoadaptation of PLMs that forces them to learn associations between linguistic phenomena and geographic locations. Geoadaptation is an intermediate training step that couples language modeling and geolocation prediction in a multi-task learning setup. In our main set of experiments, we geoadapt BERTi\'{c}, a PLM for Bosnian-Croatian-Montenegrin-Serbian (BCMS), using a corpus of geotagged BCMS tweets. Evaluation on three tasks, namely fine-tuned as well as zero-shot geolocation prediction and zero-shot prediction of dialect features, shows that geoadaptation is very effective: e.g., we obtain state-of-the-art performance in supervised geolocation prediction and report massive gains over geographically uninformed PLMs on zero-shot geolocation prediction. Moreover, in follow-up experiments we successfully geoadapt two other PLMs, specifically ScandiBERT on Norwegian, Swedish, and Danish tweets and GermanBERT on Jodel posts in German from Austria, Germany, and Switzerland, proving that the benefits of geoadaptation are not limited to a particular language area and PLM.
Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint
Burnetas, Apostolos N., Kanavetas, Odysseas, Katehakis, Michael N.
We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of $n$ total samples or equivalently to minimize the regret due to lack on information on sample distributions, For this problem we consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. Then we construct a class of f-UF policies and provide conditions under which they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound. At the end we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances.
Efficient HEX-Program Evaluation Based on Unfounded Sets
Eiter, T., Fink, M., Krennwallner, T., Redl, C., Schüller, P.
HEX-programs extend logic programs under the answer set semantics with external computations through external atoms. As reasoning from ground Horn programs with nonmonotonic external atoms of polynomial complexity is already on the second level of the polynomial hierarchy, minimality checking of answer set candidates needs special attention. To this end, we present an approach based on unfounded sets as a generalization of related techniques for ASP programs. The unfounded set detection is expressed as a propositional SAT problem, for which we provide two different encodings and optimizations to them. We then integrate our approach into a previously developed evaluation framework for HEX-programs, which is enriched by additional learning techniques that aim at avoiding the reconstruction of the same or related unfounded sets. Furthermore, we provide a syntactic criterion that allows one to skip the minimality check in many cases. An experimental evaluation shows that the new approach significantly decreases runtime.
Generating Explanations for Biomedical Queries
We introduce novel mathematical models and algorithms to generate (shortest or k different) explanations for biomedical queries, using answer set programming. We implement these algorithms and integrate them in BIOQUERY-ASP. We illustrate the usefulness of these methods with some complex biomedical queries related to drug discovery, over the biomedical knowledge resources PHARMGKB, DRUGBANK, BIOGRID, CTD, SIDER, DISEASE ONTOLOGY and ORPHADATA. To appear in Theory and Practice of Logic Programming (TPLP).
Annotating Answer-Set Programs in LANA?
De Vos, Marina, Kıza, Doğa Gizem, Oetsch, Johannes, Pührer, Jörg, Tompits, Hans
While past research in answer-set programming (ASP) mainly focused on theory, ASP solver technology, and applications, the present work situates itself in the context of a quite recent research trend: development support for ASP. In particular, we propose to augment answer-set programs with additional meta-information formulated in a dedicated annotation language, called LANA. This language allows the grouping of rules into coherent blocks and to specify language signatures, types, pre- and postconditions, as well as unit tests for such blocks. While these annotations are invisible to an ASP solver, as they take the form of program comments, they can be interpreted by tools for documentation, testing, and verification purposes, as well as to eliminate sources of common programming errors by realising syntax checking or code completion features. To demonstrate its versatility, we introduce two such tools, viz. (i) ASPDOC, for generating an HTML documentation for a program based on the annotated information, and (ii) ASPUNIT, for running and monitoring unit tests on program blocks. LANA is also exploited in the SeaLion system, an integrated development environment for ASP based on Eclipse. To appear in Theory and Practice of Logic Programming