Media
A look at adversarial attacks on radio waveforms from discrete latent space
Garuso, Attanasia, Kokalj-Filipovic, Silvija, Kaasaragadda, Yagna
Having designed a VQVAE that maps digital radio waveforms into discrete latent space, and yields a perfectly classifiable reconstruction of the original data, we here analyze the attack suppressing properties of VQVAE when an adversarial attack is performed on high-SNR radio-frequency (RF) data-points. To target amplitude modulations from a subset of digitally modulated waveform classes, we first create adversarial attacks that preserve the phase between the in-phase and quadrature component whose values are adversarially changed. We compare them with adversarial attacks of the same intensity where phase is not preserved. We test the classification accuracy of such adversarial examples on a classifier trained to deliver 100% accuracy on the original data. To assess the ability of VQVAE to suppress the strength of the attack, we evaluate the classifier accuracy on the reconstructions by VQVAE of the adversarial datapoints and show that VQVAE substantially decreases the effectiveness of the attack. We also compare the I/Q plane diagram of the attacked data, their reconstructions and the original data. Finally, using multiple methods and metrics, we compare the probability distribution of the VQVAE latent space with and without attack. Varying the attack strength, we observe interesting properties of the discrete space, which may help detect the attacks.
From Symbolic to Neural and Back: Exploring Knowledge Graph-Large Language Model Synergies
ล krlj, Blaลพ, Koloski, Boshko, Pollak, Senja, Lavraฤ, Nada
Integrating structured knowledge from Knowledge Graphs (KGs) into Large Language Models (LLMs) enhances factual grounding and reasoning capabilities. This survey paper systematically examines the synergy between KGs and LLMs, categorizing existing approaches into two main groups: KG-enhanced LLMs, which improve reasoning, reduce hallucinations, and enable complex question answering; and LLM-augmented KGs, which facilitate KG construction, completion, and querying. Through comprehensive analysis, we identify critical gaps and highlight the mutual benefits of structured knowledge integration. Compared to existing surveys, our study uniquely emphasizes scalability, computational efficiency, and data quality. Finally, we propose future research directions, including neuro-symbolic integration, dynamic KG updating, data reliability, and ethical considerations, paving the way for intelligent systems capable of managing more complex real-world knowledge tasks.
Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search
Holt, Samuel, Luyten, Max Ruiz, Pouplin, Thomas, van der Schaar, Mihaela
Large Language Models (LLMs) are increasingly capable but often require significant guidance or extensive interaction history to perform effectively in complex, interactive environments. Existing methods may struggle with adapting to new information or efficiently utilizing past experiences for multi-step reasoning without fine-tuning. We introduce a novel LLM agent framework that enhances planning capabilities through in-context learning, facilitated by atomic fact augmentation and a recursive lookahead search. Our agent learns to extract task-critical ``atomic facts'' from its interaction trajectories. These facts dynamically augment the prompts provided to LLM-based components responsible for action proposal, latent world model simulation, and state-value estimation. Planning is performed via a depth-limited lookahead search, where the LLM simulates potential trajectories and evaluates their outcomes, guided by the accumulated facts and interaction history. This approach allows the agent to improve its understanding and decision-making online, leveraging its experience to refine its behavior without weight updates. We provide a theoretical motivation linking performance to the quality of fact-based abstraction and LLM simulation accuracy. Empirically, our agent demonstrates improved performance and adaptability on challenging interactive tasks, achieving more optimal behavior as it accumulates experience, showcased in tasks such as TextFrozenLake and ALFWorld.
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
Lachenmaier, Clara, Sieker, Judith, Zarrieร, Sina
Communication among humans relies on conversational grounding, allowing interlocutors to reach mutual understanding even when they do not have perfect knowledge and must resolve discrepancies in each other's beliefs. This paper investigates how large language models (LLMs) manage common ground in cases where they (don't) possess knowledge, focusing on facts in the political domain where the risk of misinformation and grounding failure is high. We examine the ability of LLMs to answer direct knowledge questions and loaded questions that presuppose misinformation. We evaluate whether loaded questions lead LLMs to engage in active grounding and correct false user beliefs, in connection to their level of knowledge and their political bias. Our findings highlight significant challenges in LLMs' ability to engage in grounding and reject false user beliefs, raising concerns about their role in mitigating misinformation in political discourse.
Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh
Chowdhury, MD Thamed Bin Zaman, Hossain, Moazzem, Islam, Md. Ridwanul
Road accidents pose significant concerns globally. They lead to large financial losses, injuries, disabilities, and societal challenges. Accurate and timely accident data is essential for predicting and mitigating these events. This paper presents a novel framework named 'Durghotona GPT' that integrates web scraping and Large Language Models (LLMs) to automate the generation of comprehensive accident datasets from prominent national dailies in Bangladesh. The authors collected accident reports from three major newspapers: Prothom Alo, Dhaka Tribune, and The Daily Star. The collected news was then processed using the newest available LLMs: GPT-4, GPT-3.5, and Llama-3. The framework efficiently extracts relevant information, categorizes reports, and compiles detailed datasets. Thus, this framework overcomes limitations of manual data collection methods such as delays, errors, and communication gaps. The authors' evaluation demonstrates that Llama-3, an open-source model, performs comparably to GPT-4. It achieved 89% accuracy in the authors' evaluation. Therefore, it can be considered a cost-effective alternative for similar tasks. The results suggest that the framework developed by the authors can drastically enhance the quality and availability of accident data. As a result, it can support critical applications in traffic safety analysis, urban planning, and public health. The authors also developed an interface for 'Durghotona GPT' for ease of use as part of this paper. Future work will focus on expanding data collection methods and refining LLMs to further increase dataset accuracy and applicability.
Unveiling the Hidden: Movie Genre and User Bias in Spoiler Detection
Zhang, Haokai, Zhang, Shengtao, Cai, Zijian, Wang, Heng, Zhu, Ruixuan, Zeng, Zinan, Luo, Minnan
Spoilers in movie reviews are important on platforms like IMDb and Rotten Tomatoes, offering benefits and drawbacks. They can guide some viewers' choices but also affect those who prefer no plot details in advance, making effective spoiler detection essential. Existing spoiler detection methods mainly analyze review text, often overlooking the impact of movie genres and user bias, limiting their effectiveness. To address this, we analyze movie review data, finding genre-specific variations in spoiler rates and identifying that certain users are more likely to post spoilers. Based on these findings, we introduce a new spoiler detection framework called GUSD (G enre-aware and User-specific S poiler Detection), which incorporates genre-specific data and user behavior bias. User bias is calculated through dynamic graph modeling of review history. Additionally, the R2GFormer module combines Ret-GAT (Retentive Graph Attention Network) for graph information and GenreFormer for genre-specific aggregation. The GMoE (Genre-Aware Mixture of Experts) model further assigns reviews to specialized experts based on genre. Extensive testing on benchmark datasets shows that GUSD achieves state-of-the-art results.
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
Zhao, Zheng, Vania, Clara, Kayal, Subhradeep, Khan, Naila, Cohen, Shay B., Yilmaz, Emine
Large language models (LLMs) have advanced conversational AI assistants. However, systematically evaluating how well these assistants apply personalization--adapting to individual user preferences while completing tasks--remains challenging. Existing personalization benchmarks focus on chit-chat, non-conversational tasks, or narrow domains, failing to capture the complexities of personalized task-oriented assistance. To address this, we introduce PersonaLens, a comprehensive benchmark for evaluating personalization in task-oriented AI assistants. Our benchmark features diverse user profiles equipped with rich preferences and interaction histories, along with two specialized LLM-based agents: a user agent that engages in realistic task-oriented dialogues with AI assistants, and a judge agent that employs the LLM-as-a-Judge paradigm to assess personalization, response quality, and task success. Through extensive experiments with current LLM assistants across diverse tasks, we reveal significant variability in their personalization capabilities, providing crucial insights for advancing conversational AI systems.
Dataset of News Articles with Provenance Metadata for Media Relevance Assessment
Peterka, Tomas, Bohacek, Matyas
Out-of-context and misattributed imagery is the leading form of media manipulation in today's misinformation and disinformation landscape. The existing methods attempting to detect this practice often only consider whether the semantics of the imagery corresponds to the text narrative, missing manipulation so long as the depicted objects or scenes somewhat correspond to the narrative at hand. To tackle this, we introduce News Media Provenance Dataset, a dataset of news articles with provenance-tagged images. We formulate two tasks on this dataset, location of origin relevance (LOR) and date and time of origin relevance (DTOR), and present baseline results on six large language models (LLMs). We identify that, while the zero-shot performance on LOR is promising, the performance on DTOR hinders, leaving room for specialized architectures and future work.
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering
Yao, Tianjun, Li, Haoxuan, Shen, Zhiqiang, Li, Pan, Liu, Tongliang, Zhang, Kun
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains, but their reliability is hindered by the outdated knowledge and hallucinations. Retrieval-Augmented Generation mitigates these issues by grounding LLMs with external knowledge; however, most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning. Knowledge graphs, which represent facts as relational triples, offer a more structured and compact alternative. Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering (KGQA), with a significant proportion adopting the retrieve-then-reasoning paradigm. In this framework, graph-based retrievers have demonstrated strong empirical performance, yet they still face challenges in generalization ability. In this work, we propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA. RAPL addresses these limitations through three aspects: (1) a two-stage labeling strategy that combines heuristic signals with parametric models to provide causally grounded supervision; (2) a model-agnostic graph transformation approach to capture both intra- and inter-triple interactions, thereby enhancing representational capacity; and (3) a path-based reasoning strategy that facilitates learning from the injected rational knowledge, and supports downstream reasoner through structured inputs. Empirically, RAPL outperforms state-of-the-art methods by $2.66\%-20.34\%$, and significantly reduces the performance gap between smaller and more powerful LLM-based reasoners, as well as the gap under cross-dataset settings, highlighting its superior retrieval capability and generalizability. Codes are available at: https://github.com/tianyao-aka/RAPL.
Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024
McCutcheon, Austin, de Oliveira, Thiago E. A., Zheleznov, Aleksandr, Brogly, Chris
The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links. As a result, we investigated whether it was possible to automatically distinguish perceived lower-quality news headlines/links from perceived higher-quality headlines/links. We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news website links/headings from 2018-2024 (28,772,107 per class) with 115 extracted linguistic features. Binary labels for each text were derived from scores based on expert consensus regarding the respective news domain quality. Traditional ensemble methods, particularly the bagging classifier, had strong performance (88.1% accuracy, 88.3% F1, 80/20 train/test split). Fine-tuned DistilBERT achieved the highest accuracy (90.3%, 80/20 train/test split) but required more training time. The results suggest that both NLP features with traditional classifiers and deep learning models can effectively differentiate perceived news headline/link quality, with some trade-off between predictive performance and train time.