Goto

Collaborating Authors

 Azerbaijan



Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Neural Information Processing Systems

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy.


Israel becomes first country to recognize Somaliland; Trump 'not ready'

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .


Q-Sat AI: Machine Learning-Based Decision Support for Data Saturation in Qualitative Studies

Tutar, Hasan, Erden, Caner, Şentürk, Ümit

arXiv.org Artificial Intelligence

The determination of sample size in qualitative research has traditionally relied on the subjective and often ambiguous principle of data saturation, which can lead to inconsistencies and threaten methodological rigor. This study introduces a new, systematic model based on machine learning (ML) to make this process more objective. Utilizing a dataset derived from five fundamental qualitative research approaches - namely, Case Study, Grounded Theory, Phenomenology, Narrative Research, and Ethnographic Research - we developed an ensemble learning model. Ten critical parameters, including research scope, information power, and researcher competence, were evaluated using an ordinal scale and used as input features. After thorough preprocessing and outlier removal, multiple ML algorithms were trained and compared. The K-Nearest Neighbors (KNN), Gradient Boosting (GB), Random Forest (RF), XGBoost, and Decision Tree (DT) algorithms showed the highest explanatory power (Test R2 ~ 0.85), effectively modeling the complex, non-linear relationships involved in qualitative sampling decisions. Feature importance analysis confirmed the vital roles of research design type and information power, providing quantitative validation of key theoretical assumptions in qualitative methodology. The study concludes by proposing a conceptual framework for a web-based computational application designed to serve as a decision support system for qualitative researchers, journal reviewers, and thesis advisors. This model represents a significant step toward standardizing sample size justification, enhancing transparency, and strengthening the epistemological foundation of qualitative inquiry through evidence-based, systematic decision-making.


Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models

Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.


ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering

Kwon, Daeyong, Doh, SeungHeon, Nam, Juhan

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval-augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy; open-source models gain up to +56.8 percentage points (for example, Qwen3 8B improves from 35.0 to 91.8), approaching proprietary model performance. RAG-style fine-tuning further boosts both factual recall and contextual reasoning, improving results on both in-domain and out-of-domain benchmarks. MusWikiDB also yields approximately 6 percentage points higher accuracy and 40% faster retrieval than a general-purpose Wikipedia corpus. We release MusWikiDB and ArtistMus to advance research in music information retrieval and domain-specific question answering, establishing a foundation for retrieval-augmented reasoning in culturally rich domains such as music.


SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

Gao, Yuan, Song, Jin

arXiv.org Artificial Intelligence

In recent years, Image Quality Assessment (IQA) for AIgenerated images (AIGI) has advanced rapidly; however, existing methods primarily target portraits and artistic images, lacking a systematic evaluation of interior scenes. W e introduce Spatial Aesthetics, a paradigm that assesses the aesthetic quality of interior images along four dimensions: layout, harmony, lighting, and distortion. W e construct SA-BENCH, the first benchmark for spatial aesthetics, comprising 18,000 images and 50,000 precise annotations. Employing SA-BENCH, we systematically evaluate current IQA methodologies and develop SA-IQA, through MLLM fine-tuning and a multidimensional fusion approach, as a comprehensive reward framework for assessing spatial aesthetics. W e apply SA-IQA to two downstream tasks: (1) serving as a reward signal integrated with GRPO reinforcement learning to optimize the AIGC generation pipeline, and (2) Best-of-N selection to filter high-quality images and improve generation quality. Experiments indicate that SA-IQA significantly outperforms existing methods on SA-BENCH, setting a new standard for spatial aesthetics evaluation. Code and dataset will be open-sourced to advance research and applications in this domain.


VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment

Jia, Ziheng, Cao, Linhan, Han, Jinliang, Zhang, Zicheng, Qian, Jiaying, Wang, Jiarui, Chen, Zijian, Zhai, Guangtao, Min, Xiongkuo

arXiv.org Artificial Intelligence

Developing a robust visual quality assessment (VQualA) large multi-modal model (LMM) requires achieving versatility, powerfulness, and transferability. However, existing VQualA LMMs typically focus on a single task and rely on full-parameter fine-tuning, which makes them prone to overfitting on specific modalities or task types, thereby limiting their generalization capacity and transferability. To address this, we propose a vision-encoder-centered generative pre-training pipeline and develop the VITAL-Series LMMs. (1) We adopt a machine-executed annotation-scrutiny paradigm, constructing over 4.5M vision-language (VL) pairs-the largest VQualA training dataset to date. (2) We employ a multi-task training workflow that simultaneously enhances the model's quantitative scoring precision and strengthens its capability for quality interpretation across both image and video modalities. (3) Building upon the vision encoder, we realize an efficient model zoo extension: the model zoo exhibits strong zero-shot performance, and each paired decoder requires only a swift warm-up using less than 1/1000 of the pre-training data to achieve performance comparable to the fully trained counterpart. Overall, our work lays a cornerstone for advancing toward the foundation LMM for VQualA.