bangkok
LLM Hallucination Detection: HSAD
Li, JinXin, Tu, Gang, Hu, JunJie
Although Large Language Models have demonstrated powerful capabilities in a wide range of tasks such as language understanding and code generation, the frequent occurrence of hallucinations during the generation process has become a significant impediment to their deployment in critical application scenarios. Current mainstream hallucination detection methods rely on factual consistency verification or static hidden layer features. The former is constrained by the scope of knowledge coverage, while the latter struggles to capture reasoning biases during the inference process. To address these issues, and inspired by signal analysis methods in cognitive neuroscience, this paper proposes a hallucination detection method based on the frequency-domain analysis of hidden layer temporal signals, named HSAD (\textbf{H}idden \textbf{S}ignal \textbf{A}nalysis-based \textbf{D}etection). First, by treating the LLM's reasoning process as a cognitive journey that unfolds over time, we propose modeling and simulating the human process of signal perception and discrimination in a deception-detection scenario through hidden layer temporal signals. Next, The Fast Fourier Transform is applied to map these temporal signals into the frequency domain to construct spectral features, which are used to capture anomalies that arise during the reasoning process; analysis experiments on these spectral features have proven the effectiveness of this approach. Finally, a hallucination detection algorithm is designed based on these spectral features to identify hallucinations in the generated content. By effectively combining the modeling of the reasoning process with frequency-domain feature extraction, the HSAD method overcomes the limitations of existing approaches in terms of knowledge coverage and the detection of reasoning biases, demonstrating higher detection accuracy and robustness.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (4 more...)
- Information Technology > Data Science > Data Quality > Data Transformation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
A Survey on Training-free Alignment of Large Language Models
Pan, Birong, Li, Yongqi, Zhang, Weiyu, Lu, Wenpeng, Xu, Mayi, Zhou, Shen, Zhu, Yuanyuan, Zhong, Ming, Qian, Tieyun
The alignment of large language models (LLMs) aims to ensure their outputs adhere to human values, ethical standards, and legal norms. Traditional alignment methods often rely on resource-intensive fine-tuning (FT), which may suffer from knowledge degradation and face challenges in scenarios where the model accessibility or computational resources are constrained. In contrast, training-free (TF) alignment techniques--leveraging in-context learning, decoding-time adjustments, and post-generation corrections--offer a promising alternative by enabling alignment without heavily retraining LLMs, making them adaptable to both open-source and closed-source environments. This paper presents the first systematic review of TF alignment methods, categorizing them by stages of pre-decoding, in-decoding, and post-decoding. For each stage, we provide a detailed examination from the viewpoint of LLMs and multimodal LLMs (MLLMs), highlighting their mechanisms and limitations. Furthermore, we identify key challenges and future directions, paving the way for more inclusive and effective TF alignment techniques. By synthesizing and organizing the rapidly growing body of research, this survey offers a guidance for practitioners and advances the development of safer and more reliable LLMs.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- (10 more...)
Bias-Adjusted LLM Agents for Human-Like Decision-Making via Behavioral Economics
Kitadai, Ayato, Fukasawa, Yusuke, Nishino, Nariaki
Large language models (LLMs) are increasingly used to simulate human decision-making, but their intrinsic biases often diverge from real human behavior--limiting their ability to reflect population-level diversity. We address this challenge with a persona-based approach that leverages individual-level behavioral data from behavioral economics to adjust model biases. Applying this method to the ultimatum game--a standard but difficult benchmark for LLMs--we observe improved alignment between simulated and empirical behavior, particularly on the responder side. While further refinement of trait representations is needed, our results demonstrate the promise of persona-conditioned LLMs for simulating human-like decision patterns at scale.
- Asia > Thailand > Bangkok > Bangkok (0.06)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
Deep Learning-Based Forecasting of Hotel KPIs: A Cross-City Analysis of Global Urban Markets
Atapattu, C. J., Cui, Xia, Abeynayake, N. R
This study employs Long Short-Term Memory (LSTM) networks to forecast key performance indicators (KPIs), Occupancy (OCC), Average Daily Rate (ADR), and Revenue per Available Room (RevPAR), across five major cities: Manchester, Amsterdam, Dubai, Bangkok, and Mumbai. The cities were selected for their diverse economic profiles and hospitality dynamics. Monthly data from 2018 to 2025 were used, with 80% for training and 20% for testing. Advanced time series decomposition and machine learning techniques enabled accurate forecasting and trend identification. Results show that Manchester and Mumbai exhibited the highest predictive accuracy, reflecting stable demand patterns, while Dubai and Bangkok demonstrated higher variability due to seasonal and event-driven influences. The findings validate the effectiveness of LSTM models for urban hospitality forecasting and provide a comparative framework for data-driven decision-making. The models generalisability across global cities highlights its potential utility for tourism stakeholders and urban planners.
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.52)
- Asia > India > Maharashtra > Mumbai (0.52)
- Asia > Thailand > Bangkok > Bangkok (0.51)
- (4 more...)
- Consumer Products & Services > Hotels (0.94)
- Consumer Products & Services > Travel (0.74)
Investigating the Effects of Cognitive Biases in Prompts on Large Language Model Outputs
This paper investigates the influence of cognitive biases on Large Language Models (LLMs) outputs. Cognitive biases, such as confirmation and availability biases, can distort user inputs through prompts, potentially leading to unfaithful and misleading outputs from LLMs. Using a systematic framework, our study introduces various cognitive biases into prompts and assesses their impact on LLM accuracy across multiple benchmark datasets, including general and financial Q&A scenarios. The results demonstrate that even subtle biases can significantly alter LLM answer choices, highlighting a critical need for bias-aware prompt design and mitigation strategy. Additionally, our attention weight analysis highlights how these biases can alter the internal decision-making processes of LLMs, affecting the attention distribution in ways that are associated with output inaccuracies. This research has implications for Al developers and users in enhancing the robustness and reliability of Al applications in diverse domains.
- Research Report > New Finding (1.00)
- Overview (1.00)
- Education (0.68)
- Leisure & Entertainment > Sports > Football (0.46)
Prediction of Bank Credit Ratings using Heterogeneous Topological Graph Neural Networks
Bank credit ratings, assigned by agencies like Standard & Poor's, Moody's, and Fitch, evaluate a bank's financial health based on factors such as asset quality, profitability, and market position (White 2010). These ratings are critical indicators of a bank's ability to repay debt and significantly influence economic players: for businesses, they affect borrowing costs and market trust; for economies, they impact financial system stability. Sudden rating changes can trigger volatile capital flows and market fluctuations, influencing economic growth and financial stability. During financial market instability, predicting bank credit ratings, especially for the upcoming quarter, becomes crucial. These predictions provide the data needed for informed decision-making, prompt regulatory adjustments, and the protection of investors and the public. The 2023 bankruptcy of Silicon V alley Bank (SVB), which triggered collapses like those of Signature Bank and First Republic Bank, underscores the resulting financial turmoil (Aharon et al. 2023). Graph neural networks (GNNs) have become a pivotal technology in financial risk prediction, particularly excelling in node classification and link prediction tasks (Wu et al. 2022). These models effectively leverage edge information to represent the propagation of financial risk within networks.
Advancing Sequential Numerical Prediction in Autoregressive Models
Fei, Xiang, Lu, Jinghui, Sun, Qi, Feng, Hao, Wang, Yanjie, Shi, Wei, Wang, An-Lan, Tang, Jingqun, Huang, Can
Autoregressive models have become the de facto choice for sequence generation tasks, but standard approaches treat digits as independent tokens and apply cross-entropy loss, overlooking the coherent structure of numerical sequences. This paper introduces Numerical Token Integrity Loss (NTIL) to address this gap. NTIL operates at two levels: (1) token-level, where it extends the Earth Mover's Distance (EMD) to preserve ordinal relationships between numerical values, and (2) sequence-level, where it penalizes the overall discrepancy between the predicted and actual sequences. This dual approach improves numerical prediction and integrates effectively with LLMs/MLLMs. Extensive experiments show significant performance improvements with NTIL.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.06)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones?
Large language models (LLMs) have the potential of being useful tools that can automate tasks and assist humans. However, these models are more fluent in English and more aligned with Western cultures, norms, and values. Arabic-specific LLMs are being developed to better capture the nuances of the Arabic language, as well as the views of the Arabs. Yet, Arabs are sometimes assumed to share the same culture. In this position paper, I discuss the limitations of this assumption and provide preliminary thoughts for how to build systems that can better represent the cultural diversity within the Arab world. The invalidity of the cultural homogeneity assumption might seem obvious, yet, it is widely adopted in developing multilingual and Arabic-specific LLMs. I hope that this paper will encourage the NLP community to be considerate of the cultural diversity within various communities speaking the same language.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.06)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
- (31 more...)
Q-STRUM Debate: Query-Driven Contrastive Summarization for Recommendation Comparison
Saad, George-Kirollos, Sanner, Scott
Query-driven recommendation with unknown items poses a challenge for users to understand why certain items are appropriate for their needs. Query-driven Contrastive Summarization (QCS) is a methodology designed to address this issue by leveraging language-based item descriptions to clarify contrasts between them. However, existing state-of-the-art contrastive summarization methods such as STRUM-LLM fall short of this goal. To overcome these limitations, we introduce Q-STRUM Debate, a novel extension of STRUM-LLM that employs debate-style prompting to generate focused and contrastive summarizations of item aspects relevant to a query. Leveraging modern large language models (LLMs) as powerful tools for generating debates, Q-STRUM Debate provides enhanced contrastive summaries. Experiments across three datasets demonstrate that Q-STRUM Debate yields significant performance improvements over existing methods on key contrastive summarization criteria, thus introducing a novel and performant debate prompting methodology for QCS.
- North America > Canada > Ontario > Toronto (0.47)
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.08)
- (5 more...)
- Consumer Products & Services > Restaurants (1.00)
- Health & Medicine (0.68)
- Leisure & Entertainment > Sports > Skiing (0.47)
M-IFEval: Multilingual Instruction-Following Evaluation
Dussolle, Antoine, Díaz, Andrea Cardeña, Sato, Shota, Devine, Peter
Instruction following is a core capability of modern Large language models (LLMs), making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages. We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Faroe Islands > Streymoy > Tórshavn (0.04)
- (5 more...)