AITopics | cluster analysis

2510.07821

Country:

Europe > Germany (0.14)
Europe > France (0.14)
North America > United States > Missouri (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.71)
(2 more...)

Jauhiainen, Jussi S., Toppari, Aurora

Generative Artificial Intelligence and Agents in Research and Teaching

arXiv.org Artificial IntelligenceAug-27-2025

This study provides a comprehensive analysis of the development, functioning, and application of generative artificial intelligence (GenAI) and large language models (LLMs), with an emphasis on their implications for research and education. It traces the conceptual evolution from artificial intelligence (AI) through machine learning (ML) and deep learning (DL) to transformer architectures, which constitute the foundation of contemporary generative systems. Technical aspects, including prompting strategies, word embeddings, and probabilistic sampling methods (temperature, top-k, and top-p), are examined alongside the emergence of autonomous agents. These elements are considered in relation to both the opportunities they create and the limitations and risks they entail. The work critically evaluates the integration of GenAI across the research process, from ideation and literature review to research design, data collection, analysis, interpretation, and dissemination. While particular attention is given to geographical research, the discussion extends to wider academic contexts. A parallel strand addresses the pedagogical applications of GenAI, encompassing course and lesson design, teaching delivery, assessment, and feedback, with geography education serving as a case example. Central to the analysis are the ethical, social, and environmental challenges posed by GenAI. Issues of bias, intellectual property, governance, and accountability are assessed, alongside the ecological footprint of LLMs and emerging technological strategies for mitigation. The concluding section considers near- and long-term futures of GenAI, including scenarios of sustained adoption, regulation, and potential decline. By situating GenAI within both scholarly practice and educational contexts, the study contributes to critical debates on its transformative potential and societal responsibilities.

artificial intelligence, large language model, machine learning, (21 more...)

2508.16701

Country:

Europe > Estonia (0.14)
Europe > Ukraine (0.04)
Europe > Russia (0.04)
(17 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Social Sector (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Liu, Ruying, Becerik-Gerber, Burçin, Lucas, Gale M.

Investigating Role of Personal Factors in Shaping Responses to Active Shooter Incident using Machine Learning

arXiv.org Artificial IntelligenceFeb-17-2025

This study bridges the knowledge gap on how personal factors affect building occupants' responses in active shooter situations by applying interpretable machine learning methods to data from 107 participants. The personal factors studied are training methods, prior training experience, sense of direction, and gender. The response performance measurements consist of decisions (run, hide, multiple), vulnerability (corresponding to the time a participant is visible to a shooter), and pre-evacuation time. The results indicate that the propensity to run significantly determines overall response strategies, overshadowing vulnerability, and pre-evacuation time. The training method is a critical factor where VR-based training leads to better responses than video-based training. A better sense of direction and previous training experience are correlated with a greater propensity to run and less vulnerability. Gender slightly influences decisions and vulnerability but significantly impacts pre-evacuation time, with females evacuating slower, potentially due to higher risk perception. This study underscores the importance of personal factors in shaping responses to active shooter incidents.

evacuation time, personal factor, training experience, (13 more...)

2503.05719

Country: North America > United States > California > Los Angeles County > Los Angeles (0.15)

Genre:

Research Report > Experimental Study (0.73)
Research Report > New Finding (0.68)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

arXiv.org Machine LearningNov-1-2024

Classification problem in liability insurance using machine learning models: a comparative study

Qazvini, Marjan

The insurance company uses different factors to classify the policyholders. In this study, we apply several machine learning models such as nearest neighbour and logistic regression to the Actuarial Challenge dataset used by Qazvini (2019) to classify liability insurance policies into two groups: 1 - policies with claims and 2 - policies without claims. The applications of Machine Learning (ML) models and Artificial Intelligence (AI) in areas such as medical diagnosis, economics, banking, fraud detection, agriculture, etc, have been known for quite a number of years. ML models have changed these industries remarkably. However, despite their high predictive power and their capability to identify nonlinear transformations and interactions between variables, they are slowly being introduced into the insurance industry and actuarial fields.

artificial intelligence, machine learning, policy and claim frequency, (16 more...)

2411.00354

Country:

North America > United States > Maine (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Europe > France > Île-de-France > Val-de-Marne (0.04)
(40 more...)

Genre: Research Report > New Finding (0.70)

Industry:

Banking & Finance > Insurance (1.00)
Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

arXiv.org Artificial IntelligenceMay-26-2024

Renal digital pathology visual knowledge search platform based on language large model and book knowledge

Lv, Xiaomin, Lai, Chong, Ding, Liya, Lai, Maode, Sun, Qingrong

Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models, ultimately building a retrieval system based on the semantic features of large models. Based above analysis, we established a knowledge base of 10,317 renal pathology images and paired corresponding text descriptions, and then we evaluated the semantic feature capabilities of 4 large models, including GPT2, gemma, LLma and Qwen, and the image-based feature capabilities of dinov2 large model. Furthermore, we built a semantic retrieval system to retrieve pathological images based on text descriptions, and named RppD (aidp.zjsru.edu.cn). Key Words: large model, renal pathology, renal knowledge base, semantic features Introduction Histopathology holds a preeminent position within the diagnostic framework of a multitude of renal afflictions[1], including Acute kidney injury[2] to chronic glomerular inflammation[3, 4], renal organ transplantation[5], and renal malignancies[6] etc. Given the pivotal role that histopathological analysis plays in informing therapeutic strategies and prognostic assessments, seasoned investigators and clinicians have devoted substantial efforts to compile exhaustive book of prototypical histological samples, documenting the hallmark histopathological hallmarks distinctive to each disease phenotype. While numerous books provide a wealth of cases for study and research, readers often lack the capability to promptly retrieve relevant images for real-time clinical cases to give a precise diagnosis in practical diagnostic process. The advent of large language models has revolutionized the rapid construction and retrieval of knowledge bases, offering a more efficient approach.

large language model, machine learning, natural language, (16 more...)

2406.18556

Country: Asia > China > Zhejiang Province > Hangzhou (0.06)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.97)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Turfah, Ali, Wen, Xiaoquan

Interpretable Clustering with the Distinguishability Criterion

arXiv.org Machine LearningApr-25-2024

Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set remains an outstanding problem. In this work, we present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. Our computational implementation of the Distinguishability criterion corresponds to the Bayes risk of a randomized classifier under the 0-1 loss. We propose a combined loss function-based computational framework that integrates the Distinguishability criterion with many commonly used clustering procedures, such as hierarchical clustering, k-means, and finite mixture models. We present these new algorithms as well as the results from comprehensive data analysis based on simulation studies and real data applications.

artificial intelligence, distinguishability criterion, machine learning, (13 more...)

2404.15967

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > Middle East (0.04)
Asia > Middle East (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningApr-11-2024

Quality check of a sample partition using multinomial distribution

Modak, Soumita

In this paper, we advocate a novel measure for the purpose of checking the quality of a cluster partition for a sample into several distinct classes, and thus, determine the unknown value for the true number of clusters prevailing the provided set of data. Our objective leads us to the development of an approach through applying the multinomial distribution to the distances of data members, clustered in a group, from their respective cluster representatives. This procedure is carried out independently for each of the clusters, and the concerned statistics are combined together to design our targeted measure. Individual clusters separately possess the category-wise probabilities which correspond to different positions of its members in the cluster with respect to a typical member, in the form of cluster-centroid, medoid or mode, referred to as the corresponding cluster representative. Our method is robust in the sense that it is distribution-free, since this is devised irrespective of the parent distribution of the underlying sample. It fulfills one of the rare coveted qualities, present in the existing cluster accuracy measures, of having the capability to investigate whether the assigned sample owns any inherent clusters other than a single group of all members or not. Our measure's simple concept, easy algorithm, fast runtime, good performance, and wide usefulness, demonstrated through extensive simulation and diverse case-studies, make it appealing.

artificial intelligence, data mining, machine learning, (18 more...)

2404.07778

Country:

Asia > India > West Bengal > Kolkata (0.14)
North America > United States > New Jersey (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Forster, Carlos Henrique Q., de Castro, Paulo André Lima, Ramalho, Andrei

A Methodology for Questionnaire Analysis: Insights through Cluster Analysis of an Investor Competition Data

arXiv.org Artificial IntelligenceFeb-9-2024

In this paper, we propose a methodology for the analysis of questionnaire data along with its application on discovering insights from investor data motivated by a day trading competition. The questionnaire includes categorical questions, which are reduced to binary questions, 'yes' or 'no'. The methodology reduces dimensionality by grouping questions and participants with similar responses using clustering analysis. Rule discovery was performed by using a conversion rate metric. Innovative visual representations were proposed to validate the cluster analysis and the relation discovery between questions. When crossing with financial data, additional insights were revealed related to the recognized clusters.

artificial intelligence, machine learning, questionnaire, (17 more...)

2402.06759

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Brazil (0.04)
North America > United States > New York (0.04)
Europe > Montenegro (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Nandini, S., R, Sanjjushri Varshini

Estimating Countries with Similar Maternal Mortality Rate using Cluster Analysis and Pairing Countries with Identical MMR

arXiv.org Artificial IntelligenceDec-7-2023

In the evolving world, we require more additionally the young era to flourish and evolve into developed land. Most of the population all around the world are unaware of the complications involved in the routine they follow while they are pregnant and how hospital facilities affect maternal health. Maternal Mortality is the death of a pregnant woman due to intricacies correlated to pregnancy, underlying circumstances exacerbated by the pregnancy or management of these situations. It is crucial to consider the Maternal Mortality Rate (MMR) in diverse locations and determine which human routines and hospital facilities diminish the Maternal Mortality Rate (MMR). This research aims to examine and discover the countries which are keeping more lavish threats of MMR and countries alike in MMR encountered. Data is examined and collected for various countries, data consists of the earlier years' observation. From the perspective of Machine Learning, Unsupervised Machine Learning is implemented to perform Cluster Analysis. Therefore the pairs of countries with similar MMR as well as the extreme opposite pair concerning the MMR are found.

artificial intelligence, machine learning, maternal mortality rate, (11 more...)

2312.04275

Country:

Asia > Indonesia (0.15)
North America > United States (0.14)
Asia > Afghanistan (0.05)
(3 more...)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Public Health > Maternal Health (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Metzner, Claus, Schilling, Achim, Krauss, Patrick

Beyond Labels: Advancing Cluster Analysis with the Entropy of Distance Distribution (EDD)

arXiv.org Machine LearningNov-28-2023

In the evolving landscape of data science, the accurate quantification of clustering in high-dimensional data sets remains a significant challenge, especially in the absence of predefined labels. This paper introduces a novel approach, the Entropy of Distance Distribution (EDD), which represents a paradigm shift in label-free clustering analysis. Traditional methods, reliant on discrete labels, often struggle to discern intricate cluster patterns in unlabeled data. EDD, however, leverages the characteristic differences in pairwise point-to-point distances to discern clustering tendencies, independent of data labeling. Our method employs the Shannon information entropy to quantify the 'peakedness' or 'flatness' of distance distributions in a data set. This entropy measure, normalized against its maximum value, effectively distinguishes between strongly clustered data (indicated by pronounced peaks in distance distribution) and more homogeneous, non-clustered data sets. This label-free quantification is resilient against global translations and permutations of data points, and with an additional dimension-wise z-scoring, it becomes invariant to data set scaling. We demonstrate the efficacy of EDD through a series of experiments involving two-dimensional data spaces with Gaussian cluster centers. Our findings reveal a monotonic increase in the EDD value with the widening of cluster widths, moving from well-separated to overlapping clusters. This behavior underscores the method's sensitivity and accuracy in detecting varying degrees of clustering. EDD's potential extends beyond conventional clustering analysis, offering a robust, scalable tool for unraveling complex data structures without reliance on pre-assigned labels.

artificial intelligence, edd, machine learning, (15 more...)

2311.16621

Country: Europe > Germany (0.05)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)