AITopics

2306.13794

Country:

North America > Canada > Quebec > Montreal (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Transportation > Passenger (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(3 more...)

Hozumi, Yuta, Wei, Gu-Wei

Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE

arXiv.org Artificial IntelligenceJun-23-2023

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that profiles the transcriptome of individual cells within a tissue or organ, aiming to gain understanding of gene expression, gene regulation, cell-cell interaction, spatial transcriptomics, signal transduction pathways, and more [1]. The typical workflow of scRNA-seq involves cell isolation, RNA extraction, library preparation, sequencing, and data analysis. One of the key challenges in scRNA-seq analysis is the enormous amount of data generated, which can be complex, nonuniform, noisy, unlabeled, and of excessively high dimensions. A typical data analysis pipeline involves data preprocessing, gene expression quantification, normalization and batch correction, dimensionality reduction, cell type identification, differential gene expression analysis, and pathway and functional analysis [2-7]. Principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) are some of the most popular approaches for dimensionality reduction, clustering, and visualization of scRNA-seq data. PCA is a classical technique used for dimensionality reduction and visualization. It identifies the most important patterns and correlations in a high-dimensional dataset and expresses them as a linear combination of new and orthogonal components. Among them, the first few components are often regarded as principal components, which can be used to visualize the scRNA-seq data in a lower-dimensional space or to identify important gene expression patterns.

artificial intelligence, data mining, machine learning, (16 more...)

2306.1375

Country:

North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Artificial IntelligenceJun-23-2023

A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules

Kim, Sehwan, Song, Qifan, Liang, Faming

The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material.

artificial intelligence, generator, machine learning, (14 more...)

2306.13641

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Semi-automated extraction of research topics and trends from NCI funding in radiological sciences from 2000-2020

Nguyen, Mark, Beidler, Peter, Tsai, Joseph, Anderson, August, Chen, Daniel, Kinahan, Paul, Kang, John

Investigators, funders, and the public desire knowledge on topics and trends in publicly funded research but current efforts in manual categorization are limited in scale and understanding. We developed a semi-automated approach to extract and name research topics, and applied this to \$1.9B of NCI funding over 21 years in the radiological sciences to determine micro- and macro-scale research topics and funding trends. Our method relies on sequential clustering of existing biomedical-based word embeddings, naming using subject matter experts, and visualization to discover trends at a macroscopic scale above individual topics. We present results using 15 and 60 cluster topics, where we found that 2D projection of grant embeddings reveals two dominant axes: physics-biology and therapeutic-diagnostic. For our dataset, we found that funding for therapeutics- and physics-based research have outpaced diagnostics- and biology-based research, respectively. We hope these results may (1) give insight to funders on the appropriateness of their funding allocation, (2) assist investigators in contextualizing their work and explore neighboring research domains, and (3) allow the public to review where their tax dollars are being allocated.

data mining, machine learning, research topic, (21 more...)

2306.13075

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Troisemaine, Colin, Flocon-Cholet, Joachim, Gosselin, Stéphane, Reiffers-Masson, Alexandre, Vaton, Sandrine, Lemaire, Vincent

An Interactive Interface for Novel Class Discovery in Tabular Data

Novel Class Discovery (NCD) is the problem of trying to discover novel classes in an unlabeled set, given a labeled set of different but related classes. The majority of NCD methods proposed so far only deal with image data, despite tabular data being among the most widely used type of data in practical applications. To interpret the results of clustering or NCD algorithms, data scientists need to understand the domain- and application-specific attributes of tabular data. This task is difficult and can often only be performed by a domain expert. Therefore, this interface allows a domain expert to easily run state-of-the-art algorithms for NCD in tabular data. With minimal knowledge in data science, interpretable results can be generated.

artificial intelligence, machine learning, novel class discovery, (12 more...)

2306.12919

Country:

Europe > France > Brittany > Finistère > Brest (0.05)
Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Hotegni, Sèdjro S., Mahabadi, Sepideh, Vakilian, Ali

Approximation Algorithms for Fair Range Clustering

This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of $n$ points in a metric space $(P,d)$ where each point belongs to one of the $\ell$ different demographics (i.e., $P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell$) and a set of $\ell$ intervals $[\alpha_1, \beta_1], \cdots, [\alpha_\ell, \beta_\ell]$ on desired number of centers from each group, the goal is to pick a set of $k$ centers $C$ with minimum $\ell_p$-clustering cost (i.e., $(\sum_{v\in P} d(v,C)^p)^{1/p}$) such that for each group $i\in \ell$, $|C\cap P_i| \in [\alpha_i, \beta_i]$. In particular, the fair range $\ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range $\ell_p$-clustering for all values of $p\in [1,\infty)$.

artificial intelligence, fairrangelp, machine learning, (18 more...)

2306.06778

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Africa > Rwanda (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities

Liu, Chuang, Zhan, Yibing, Wu, Jia, Li, Chang, Du, Bo, Hu, Wenbin, Liu, Tongliang, Tao, Dacheng

Graph neural networks have emerged as a leading architecture for many graph-level tasks, such as graph classification and graph generation. As an essential component of the architecture, graph pooling is indispensable for obtaining a holistic graph-level representation of the whole graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these works. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods for graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods with a mathematical summary for each category; 2) then, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) next, we further outline the applications that incorporate the idea of graph pooling in a variety of domains; 4) finally, we discuss certain critical challenges facing current studies and share our insights on future potential directions for research on the improvement of graph pooling.

artificial intelligence, data mining, machine learning, (18 more...)

2204.07321

Country:

Asia > China > Hubei Province > Wuhan (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Overview (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)
Health & Medicine > Therapeutic Area > Immunology (0.47)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Xian, R. Patrick, Morelock, Ryan J., Hadar, Ido, Musgrave, Charles B., Sutton, Christopher

From structure mining to unsupervised exploration of atomic octahedral networks

arXiv.org Artificial IntelligenceJun-21-2023

Networks of atom-centered coordination octahedra commonly occur in inorganic and hybrid solid-state materials. Characterizing their spatial arrangements and characteristics is crucial for relating structures to properties for many materials families. The traditional method using case-by-case inspection becomes prohibitive for discovering trends and similarities in large datasets. Here, we operationalize chemical intuition to automate the geometric parsing, quantification, and classification of coordination octahedral networks. We find axis-resolved tilting trends in ABO$_{3}$ perovskite polymorphs, which assist in detecting oxidation state changes. Moreover, we develop a scale-invariant encoding scheme to represent these networks, which, combined with human-assisted unsupervised machine learning, allows us to taxonomize the inorganic framework polytypes in hybrid iodoplumbates (A$_x$Pb$_y$I$_z$). Consequently, we uncover a violation of Pauling's third rule and the design principles underpinning their topological diversity. Our results offer a glimpse into the vast design space of atomic octahedral networks and inform high-throughput, targeted screening of specific structure types.

artificial intelligence, connectivity, machine learning, (16 more...)

2306.12272

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Renewable (1.00)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Chauhan, Harish, Gupta, Nikunj, Haskell-Craig, Zoe

Understanding human mobility patterns in Chicago: an analysis of taxi data using clustering techniques

arXiv.org Artificial IntelligenceJun-21-2023

Understanding human mobility patterns is important in applications as diverse as urban planning, public health, and political organizing. One rich source of data on human mobility is taxi ride data. Using the city of Chicago as a case study, we examine data from taxi rides in 2016 with the goal of understanding how neighborhoods are interconnected. This analysis will provide a sense of which neighborhoods individuals are using taxis to travel between, suggesting regions to focus new public transit development efforts. Additionally, this analysis will map traffic circulation patterns and provide an understanding of where in the city people are traveling from and where they are heading to - perhaps informing traffic or road pollution mitigation efforts. For the first application, representing the data as an undirected graph will suffice. Transit lines run in both directions so simply a knowledge of which neighborhoods have high rates of taxi travel between them provides an argument for placing public transit along those routes. However, in order to understand the flow of people throughout a city, we must make a distinction between the neighborhood from which people are departing and the areas to which they are arriving - this requires methods that can deal with directed graphs. All developed codes can be found at https://github.com/Nikunj-Gupta/Spectral-Clustering-Directed-Graphs.

artificial intelligence, graph, machine learning, (17 more...)

2306.12094

Country:

North America > United States > Illinois > Cook County > Chicago (0.63)
Europe > Netherlands > South Holland > Leiden (0.07)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Industry:

Transportation > Passenger (0.55)
Transportation > Ground > Road (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Znaleźniak, Michał, Rola, Przemysław, Kaszuba, Patryk, Tabor, Jacek, Śmieja, Marek

Contrastive Hierarchical Clustering

arXiv.org Artificial IntelligenceJun-21-2023

Deep clustering has been dominated by flat models, which split a dataset into a predefined number of groups. Although recent methods achieve an extremely high similarity with the ground truth on popular benchmarks, the information contained in the flat partition is limited. In this paper, we introduce CoHiClust, a Contrastive Hierarchical Clustering model based on deep neural networks, which can be applied to typical image data. By employing a self-supervised learning approach, CoHiClust distills the base network into a binary tree without access to any labeled data. The hierarchical clustering structure can be used to analyze the relationship between clusters, as well as to measure the similarity between data points. Experiments demonstrate that CoHiClust generates a reasonable structure of clusters, which is consistent with our intuition and image semantics. Moreover, it obtains superior clustering accuracy on most of the image datasets compared to the state-of-the-art flat clustering models. Our implementation is available at https://github.

artificial intelligence, cohiclust, machine learning, (16 more...)

2303.03389

Country: Europe > Poland > Lesser Poland Province > Kraków (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)