qualitative research
Q-Sat AI: Machine Learning-Based Decision Support for Data Saturation in Qualitative Studies
Tutar, Hasan, Erden, Caner, Şentürk, Ümit
The determination of sample size in qualitative research has traditionally relied on the subjective and often ambiguous principle of data saturation, which can lead to inconsistencies and threaten methodological rigor. This study introduces a new, systematic model based on machine learning (ML) to make this process more objective. Utilizing a dataset derived from five fundamental qualitative research approaches - namely, Case Study, Grounded Theory, Phenomenology, Narrative Research, and Ethnographic Research - we developed an ensemble learning model. Ten critical parameters, including research scope, information power, and researcher competence, were evaluated using an ordinal scale and used as input features. After thorough preprocessing and outlier removal, multiple ML algorithms were trained and compared. The K-Nearest Neighbors (KNN), Gradient Boosting (GB), Random Forest (RF), XGBoost, and Decision Tree (DT) algorithms showed the highest explanatory power (Test R2 ~ 0.85), effectively modeling the complex, non-linear relationships involved in qualitative sampling decisions. Feature importance analysis confirmed the vital roles of research design type and information power, providing quantitative validation of key theoretical assumptions in qualitative methodology. The study concludes by proposing a conceptual framework for a web-based computational application designed to serve as a decision support system for qualitative researchers, journal reviewers, and thesis advisors. This model represents a significant step toward standardizing sample size justification, enhancing transparency, and strengthening the epistemological foundation of qualitative inquiry through evidence-based, systematic decision-making.
- Asia > Middle East > Republic of Türkiye (0.14)
- Europe > Switzerland (0.04)
- Asia > Azerbaijan (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.68)
Can machines perform a qualitative data analysis? Reading the debate with Alan Turing
This paper reflects on the literature that rejects the use of Large Language Models (LLMs) in qualitative data analysis. It illustrates through empirical evidence as well as critical reflections why the current critical debate is focusing on the wrong problems . The paper proposes that the focus of researching the use of the LLMs for qualitative analysis is not the method per se, but rather the empirical investigation of an artificial system performing an analysis . The paper bui lds on the seminal work of Alan Turing and reads the current debate using key ideas from Turing's "Computing Machinery and Intelligence". Th is paper therefore reframes the debate on qualitative analysis with LLMs and states that ra ther than asking whether machines can perform qualitative analysis in principle, we should ask whether with LLMs we can produce analyses that are sufficiently comparable to human analysts. In the final part the contrary views to performing qualitative analysis with LLMs are analysed using the same writing and rhetorical style that Turing used in his seminal work, to discuss the contrary views to the main question.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- Europe > United Kingdom > England > Essex > Colchester (0.04)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Consumer Health (0.67)
- Information Technology > Security & Privacy (0.66)
Not Everything That Counts Can Be Counted: A Case for Safe Qualitative AI
Artificial intelligence (AI) and large language models (LLM) are reshaping science, with most recent advances culminating in fully-automated scientific discovery pipelines. But qualitative research has been left behind. Researchers in qualitative methods are hesitant about AI adoption. Yet when they are willing to use AI at all, they have little choice but to rely on general-purpose tools like ChatGPT to assist with interview interpretation, data annotation, and topic modeling - while simultaneously acknowledging these system's well-known limitations of being biased, opaque, irreproducible, and privacy-compromising. This creates a critical gap: while AI has substantially advanced quantitative methods, the qualitative dimensions essential for meaning-making and comprehensive scientific understanding remain poorly integrated. We argue for developing dedicated qualitative AI systems built from the ground up for interpretive research. Such systems must be transparent, reproducible, and privacy-friendly. We review recent literature to show how existing automated discovery pipelines could be enhanced by robust qualitative capabilities, and identify key opportunities where safe qualitative AI could advance multidisciplinary and mixed-methods research.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Ventura County > Thousand Oaks (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.68)
Generative Artificial Intelligence in Qualitative Research Methods: Between Hype and Risks?
Teixeira, Maria Couto, Tschopp, Marisa, Jobin, Anna
As Artificial Intelligence (AI) is increasingly promoted and used in qualitative research, it also raises profound methodological issues. This position paper critically interrogates the role of generative AI (genAI) in the context of qualitative coding methodologies. Despite widespread hype and claims of efficiency, we propose that genAI is not methodologically valid within qualitative inquiries, and its use risks undermining the robustness and trustworthiness of qualitative research. The lack of meaningful documentation, commercial opacity, and the inherent tendencies of genAI systems to produce incorrect outputs all contribute to weakening methodological rigor. Overall, the balance between risk and benefits does not support the use of genAI in qualitative research, and our position paper cautions researchers to put sound methodology before technological novelty.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Switzerland > Fribourg > Fribourg (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
A Confidence-Diversity Framework for Calibrating AI Judgement in Accessible Qualitative Coding Tasks
LLMs enable qualitative coding at large scale, but assessing reliability remains challenging where human experts seldom agree. We investigate confidence-diversity calibration as a quality assessment framework for accessible coding tasks where LLMs already demonstrate strong performance but exhibit overconfidence. Analysing 5,680 coding decisions from eight state-of-the-art LLMs across ten categories, we find that mean self-confidence tracks inter-model agreement closely (Pearson r=0.82). Adding model diversity quantified as normalised Shannon entropy produces a dual signal explaining agreement almost completely (R-squared=0.979), though this high predictive power likely reflects task simplicity for current LLMs. The framework enables a three-tier workflow auto-accepting 35 percent of segments with less than 5 percent error, cutting manual effort by 65 percent. Cross-domain validation confirms transferability (kappa improvements of 0.20 to 0.78). While establishing a methodological foundation for AI judgement calibration, the true potential likely lies in more challenging scenarios where LLMs may demonstrate comparative advantages over human cognitive limitations.
Computationally Intensive Research: Advancing a Role for Secondary Analysis of Qualitative Data
This paper draws attention to the potential of computational methods in reworking data generated in past qualitative studies. While qualitative inquiries often produce rich data through rigorous and resource-intensive processes, much of this data usually remains unused. In this paper, we first make a general case for secondary analysis of qualitative data by discussing its benefits, distinctions, and epistemological aspects. We then argue for opportunities with computationally intensive secondary analysis, highlighting the possibility of drawing on data assemblages spanning multiple contexts and timeframes to address cross-contextual and longitudinal research phenomena and questions. We propose a scheme to perform computationally intensive secondary analysis and advance ideas on how this approach can help facilitate the development of innovative research designs. Finally, we enumerate some key challenges and ongoing concerns associated with qualitative data sharing and reuse.
- Asia > Middle East > Jordan (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Alabama (0.04)
- (12 more...)
- Research Report > Experimental Study (1.00)
- Overview (0.93)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area (1.00)
The AI Co-Ethnographer: How Far Can Automation Take Qualitative Research?
Retkowski, Fabian, Sudmann, Andreas, Waibel, Alexander
Qualitative research often involves labor-intensive processes that are difficult to scale while preserving analytical depth. This paper introduces The AI Co-Ethnographer (AICoE), a novel end-to-end pipeline developed for qualitative research and designed to move beyond the limitations of simply automating code assignments, offering a more integrated approach. AICoE organizes the entire process, encompassing open coding, code consolidation, code application, and even pattern discovery, leading to a comprehensive analysis of qualitative data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- (11 more...)
- Research Report (1.00)
- Personal > Interview (1.00)
Optimizing Generative AI's Accuracy and Transparency in Inductive Thematic Analysis: A Human-AI Comparison
Nyaaba, Matthew, SungEun, Min, Apam, Mary Abiswin, Acheampong, Kwame Owoahene, Dwamena, Emmanuel
This study explores the use of OpenAI's API for inductive thematic analysis, employing a stepwise strategy to enhance transparency and traceability in GenAI-generated coding. A five-phase analysis and evaluation process were followed. Using the stepwise prompt, GenAI effectively generated codes with supporting statements and references, categorized themes, and developed broader interpretations by linking them to real-world contexts. While GenAI performed at a comparable level to human coders in coding and theming, it exhibited a more generalized and conceptual approach to interpretation, whereas human coders provided more specific, theme-based interpretations. Mapping these processes onto Naeem et al.'s (2023) six-step thematic analysis framework, GenAI covered four out of the six steps, while human coders followed three steps. Although GenAI's coding, theming, and interpretation align with keywording, coding, theming, and interpretation in Naeem et al.'s framework, human coders' interpretations were more closely tied to themes rather than broader conceptualization. This study positions GenAI as a viable tool for conducting inductive thematic analysis with minimal human intervention, offering an efficient and structured approach to qualitative data analysis. Future research should explore the development of specialized prompts that align GenAI's inductive thematic analysis with established qualitative research frameworks.
- Europe > United Kingdom (0.30)
- North America > United States > Georgia > Clarke County > Athens (0.14)
- Asia (0.14)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
You Can't Get There From Here: Redefining Information Science to address our sociotechnical futures
Current definitions of Information Science are inadequate to comprehensively describe the nature of its field of study and for addressing the problems that are arising from intelligent technologies. The ubiquitous rise of artificial intelligence applications and their impact on society demands the field of Information Science acknowledge the socio-technical nature of these technologies. Previous definitions of Information Science over the last six decades have inadequately addressed the environmental, human, and social aspects of these technologies. This perspective piece advocates for an expanded definition of Information Science that fully includes the socio-technical impacts information has on the conduct of research in this field. Proposing an expanded definition of Information Science that includes the socio-technical aspects of this field should stimulate both conversation and widen the interdisciplinary lens necessary to address how intelligent technologies may be incorporated into society and our lives more fairly.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Netherlands (0.04)
- (8 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Media (0.68)
- (2 more...)
A Computational Method for Measuring "Open Codes" in Qualitative Analysis
Chen, John, Lotsos, Alexandros, Zhao, Lexie, Wang, Caiyi, Hullman, Jessica, Sherin, Bruce, Wilensky, Uri, Horn, Michael
Qualitative analysis is critical to understanding human datasets in many social science disciplines. Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets. Yet, meeting methodological expectations (such as "as exhaustive as possible") can be challenging. While many machine learning (ML)/generative AI (GAI) studies have attempted to support open coding, few have systematically measured or evaluated GAI outcomes, increasing potential bias risks. Building on Grounded Theory and Thematic Analysis theories, we present a computational method to measure and identify potential biases from "open codes" systematically. Instead of operationalizing human expert results as the "ground truth," our method is built upon a team-based approach between human and machine coders. We experiment with two HCI datasets to establish this method's reliability by 1) comparing it with human analysis, and 2) analyzing its output stability. We present evidence-based suggestions and example workflows for ML/GAI to support open coding.
- North America > United States > California > Ventura County > Thousand Oaks (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Hillsborough County > University (0.05)
- (12 more...)
- Education (1.00)
- Information Technology > Security & Privacy (0.67)