AITopics

2308.14105

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningAug-28-2023

Some issues in robust clustering

Hennig, Christian

Cluster analysis is about finding groups in data. Robust statistics is about methods that are not affected strongly by deviations from the statistical model assumptions or moderate changes in a data set. Particular attention has been paid in the robustness literature to the effect of outliers. Outliers and other model deviations can have a strong effect on cluster analysis methods as well. There is now much work on robust cluster analysis, see [1, 19, 9] for overviews.

artificial intelligence, machine learning, outlier, (16 more...)

arXiv.org Machine Learning

2308.14478

Country:

North America > United States > Florida > Palm Beach County > Boca Raton (0.05)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Mousavi-Sadr, Mahdiyar, Jassur, Davood M., Gozaliasl, Ghassem

Revisiting mass-radius relationships for exoplanet populations: a machine learning insight

arXiv.org Artificial IntelligenceAug-28-2023

The growing number of exoplanet discoveries and advances in machine learning techniques have opened new avenues for exploring and understanding the characteristics of worlds beyond our Solar System. In this study, we employ efficient machine learning approaches to analyze a dataset comprising 762 confirmed exoplanets and eight Solar System planets, aiming to characterize their fundamental quantities. By applying different unsupervised clustering algorithms, we classify the data into two main classes: 'small' and 'giant' planets, with cut-off values at $R_{p}=8.13R_{\oplus}$ and $M_{p}=52.48M_{\oplus}$. This classification reveals an intriguing distinction: giant planets have lower densities, suggesting higher H-He mass fractions, while small planets are denser, composed mainly of heavier elements. We apply various regression models to uncover correlations between physical parameters and their predictive power for exoplanet radius. Our analysis highlights that planetary mass, orbital period, and stellar mass play crucial roles in predicting exoplanet radius. Among the models evaluated, the Support Vector Regression consistently outperforms others, demonstrating its promise for obtaining accurate planetary radius estimates. Furthermore, we derive parametric equations using the M5P and Markov Chain Monte Carlo methods. Notably, our study reveals a noteworthy result: small planets exhibit a positive linear mass-radius relation, aligning with previous findings. Conversely, for giant planets, we observe a strong correlation between planetary radius and the mass of their host stars, which might provide intriguing insights into the relationship between giant planet formation and stellar characteristics.

artificial intelligence, machine learning, planet, (17 more...)

2301.07143

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)
(3 more...)

Genre: Research Report (0.84)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

arXiv.org Artificial IntelligenceAug-26-2023

Reinforcement Learning Based Multi-modal Feature Fusion Network for Novel Class Discovery

Li, Qiang, Ma, Qiuyang, Nie, Weizhi, Liu, Anan

With the development of deep learning techniques, supervised learning has achieved performances surpassing those of humans. Researchers have designed numerous corresponding models for different data modalities, achieving excellent results in supervised tasks. However, with the exponential increase of data in multiple fields, the recognition and classification of unlabeled data have gradually become a hot topic. In this paper, we employed a Reinforcement Learning framework to simulate the cognitive processes of humans for effectively addressing novel class discovery in the Open-set domain. We deployed a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information, aiming to acquire a more comprehensive understanding of the feature space. Furthermore, this approach facilitated the incorporation of self-supervised learning to enhance model training. We employed a clustering method with varying constraint conditions, ranging from strict to loose, allowing for the generation of dependable labels for a subset of unlabeled data during the training phase. This iterative process is similar to human exploratory learning of unknown data. These mechanisms collectively update the network parameters based on rewards received from environmental feedback. This process enables effective control over the extent of exploration learning, ensuring the accuracy of learning in unknown data categories. We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets. Our approach achieves competitive competitive results.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2308.13801

Country: Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Characteristics of networks generated by kernel growing neural gas

Fujita, Kazuhisa

This research aims to develop kernel GNG, a kernelized version of the growing neural gas (GNG) algorithm, and to investigate the features of the networks generated by the kernel GNG. The GNG is an unsupervised artificial neural network that can transform a dataset into an undirected graph, thereby extracting the features of the dataset as a graph. The GNG is widely used in vector quantization, clustering, and 3D graphics. Kernel methods are often used to map a dataset to feature space, with support vector machines being the most prominent application. This paper introduces the kernel GNG approach and explores the characteristics of the networks generated by kernel GNG. Five kernels, including Gaussian, Laplacian, Cauchy, inverse multiquadric, and log kernels, are used in this study. The results of this study show that the average degree and the average clustering coefficient decrease as the kernel parameter increases for Gaussian, Laplacian, Cauchy, and IMQ kernels. If we avoid more edges and a higher clustering coefficient (or more triangles), the kernel GNG with a larger value of the parameter will be more appropriate.

dataset, kernel, kernel gng, (16 more...)

doi: 10.5121/ijaia.2023.14503

2308.08163

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Belgium > Flanders > West Flanders > Bruges (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Network Embedding Using Sparse Approximations of Random Walks

Mercurio, Paula, Liu, Di

In this paper, we propose an efficient numerical implementation of Network Embedding based on commute times, using sparse approximation of a diffusion process on the network obtained by a modified version of the diffusion wavelet algorithm. The node embeddings are computed by optimizing the cross entropy loss via the stochastic gradient descent method with sampling of low-dimensional representations of green functions. We demonstrate the efficacy of this method for data clustering and multi-label classification through several examples, and compare its performance over existing methods in terms of efficiency and accuracy. Theoretical issues justifying the scheme are also discussed.

approximation, artificial intelligence, machine learning, (17 more...)

2308.13663

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Minnesota > Rice County > Northfield (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Habchi, Yassine, Himeur, Yassine, Kheddar, Hamza, Boukabou, Abdelkrim, Atalla, Shadi, Chouchane, Ammar, Ouamane, Abdelmalik, Mansoor, Wathiq

AI in Thyroid Cancer Diagnosis: Techniques, Trends, and Future Directions

There has been a growing interest in creating intelligent diagnostic systems to assist medical professionals in analyzing and processing big data for the treatment of incurable diseases. One of the key challenges in this field is detecting thyroid cancer, where advancements have been made using machine learning (ML) and big data analytics to evaluate thyroid cancer prognosis and determine a patient's risk of malignancy. This review paper summarizes a large collection of articles related to artificial intelligence (AI)-based techniques used in the diagnosis of thyroid cancer. Accordingly, a new classification was introduced to classify these techniques based on the AI algorithms used, the purpose of the framework, and the computing platforms used. Additionally, this study compares existing thyroid cancer datasets based on their features. The focus of this study is on how AI-based tools can support the diagnosis and treatment of thyroid cancer, through supervised, unsupervised, or hybrid techniques. It also highlights the progress made and the unresolved challenges in this field. Finally, the future trends and areas of focus in this field are discussed.

classification, data mining, machine learning, (22 more...)

2308.13592

Country:

Asia > China (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Wisconsin (0.04)
(16 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Thyroid Cancer (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
(13 more...)

Discovering Mental Health Research Topics with Topic Modeling

Gao, Xin, Sazara, Cem

Mental health significantly influences various aspects of our daily lives, and its importance has been increasingly recognized by the research community and the general public, particularly in the wake of the COVID-19 pandemic. This heightened interest is evident in the growing number of publications dedicated to mental health in the past decade. In this study, our goal is to identify general trends in the field and pinpoint high-impact research topics by analyzing a large dataset of mental health research papers. To accomplish this, we collected abstracts from various databases and trained a customized Sentence-BERT based embedding model leveraging the BERTopic framework. Our dataset comprises 96,676 research papers pertaining to mental health, enabling us to examine the relationships between different topics using their abstracts. To evaluate the effectiveness of the model, we compared it against two other state-of-the-art methods: Top2Vec model and LDA-BERT model. The model demonstrated superior performance in metrics that measure topic diversity and coherence. To enhance our analysis, we also generated word clouds to provide a comprehensive overview of the machine learning models applied in mental health research, shedding light on commonly utilized techniques and emerging trends. Furthermore, we provide a GitHub link* to the dataset used in this paper, ensuring its accessibility for further research endeavors.

discovering mental health research topic, machine learning, natural language, (14 more...)

2308.13569

Country:

Asia > Middle East > Republic of Türkiye (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Communications > Social Media (0.95)
(2 more...)

arXiv.org Artificial IntelligenceAug-24-2023

Unsupervised Manifold Linearizing and Clustering

Ding, Tianjiao, Tong, Shengbang, Chan, Kwan Ho Ryan, Dai, Xili, Ma, Yi, Haeffele, Benjamin D.

We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.

artificial intelligence, machine learning, subspace, (16 more...)

2301.01805

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report (0.64)
Overview (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Chaudhuri, Arghya Roy, Jawanpuria, Pratik, Mishra, Bamdev

ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits

arXiv.org Artificial IntelligenceAug-23-2023

In this work, we propose a multi-armed bandit-based framework for identifying a compact set of informative data instances (i.e., the prototypes) from a source dataset $S$ that best represents a given target set $T$. Prototypical examples of a given dataset offer interpretable insights into the underlying data distribution and assist in example-based reasoning, thereby influencing every sphere of human decision-making. Current state-of-the-art prototype selection approaches require $O(|S||T|)$ similarity comparisons between source and target data points, which becomes prohibitively expensive for large-scale settings. We propose to mitigate this limitation by employing stochastic greedy search in the space of prototypical examples and multi-armed bandits for reducing the number of similarity comparisons. Our randomized algorithm, ProtoBandit, identifies a set of $k$ prototypes incurring $O(k^3|S|)$ similarity comparisons, which is independent of the size of the target set. An interesting outcome of our analysis is for the $k$-medoids clustering problem $T = S$ setting) in which we show that our algorithm ProtoBandit approximates the BUILD step solution of the partitioning around medoids (PAM) method in $O(k^3|S|)$ complexity. Empirically, we observe that ProtoBandit reduces the number of similarity computation calls by several orders of magnitudes ($100-1000$ times) while obtaining solutions similar in quality to those from state-of-the-art approaches.

artificial intelligence, data mining, machine learning, (14 more...)

2210.0186

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)