AITopics

2412.06805

Country:

North America > Canada (0.46)
Asia > Middle East (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

arXiv.org Artificial IntelligenceDec-30-2022

Essential Number of Principal Components and Nearly Training-Free Model for Spectral Analysis

Bie, Yifeng, You, Shuai, Li, Xinrui, Zhang, Xuekui, Lu, Tao

Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component to the mixture constituent can be established, leading to a significant simplification of spectral quantification. Further, with the knowledge of the molar extinction coefficients of each constituent, a complete principal component set can be extracted from the coefficients directly, and few to none training samples are required for the learning model. Compared to other approaches, the proposed methods provide fast and accurate spectral quantification solutions with a small memory size needed.

artificial intelligence, data mining, machine learning, (18 more...)

2212.14623

Country:

North America > Canada (0.29)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

arXiv.org Artificial IntelligenceNov-30-2022

Novel Modelling Strategies for High-frequency Stock Trading Data

Zhang, Xuekui, Huang, Yuying, Xu, Ke, Xing, Li

Full electronic automation in stock exchanges has recently become popular, generating high-frequency intraday data and motivating the development of near real-time price forecasting methods. Machine learning algorithms are widely applied to mid-price stock predictions. Processing raw data as inputs for prediction models (e.g., data thinning and feature engineering) can primarily affect the performance of the prediction methods. However, researchers rarely discuss this topic. This motivated us to propose three novel modelling strategies for processing raw data. We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks. In these experiments, our strategies often lead to statistically significant improvement in predictions. The three strategies improve the F1 scores of the SVM models by 0.056, 0.087, and 0.016, respectively.

artificial intelligence, information, machine learning, (18 more...)

doi: 10.1186/s40854-022-00431-9

2212.00148

Country:

North America > United States (0.67)
North America > Canada (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.67)

arXiv.org Artificial IntelligenceOct-1-2021

A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data

Cao, Xiaowen, Xing, Li, Majd, Elham, He, Hua, Gu, Junhua, Zhang, Xuekui

Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Besides cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets. Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms' computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes. Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets.

algorithm, artificial intelligence, machine learning, (18 more...)

doi: 10.3389/fgene.2022.836798

2110.00681

Country: North America > Canada (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Machine LearningAug-13-2021

Multi-Stage Graph Peeling Algorithm for Probabilistic Core Decomposition

Guo, Yang, Zhang, Xuekui, Esfahani, Fatemeh, Srinivasan, Venkatesh, Thomo, Alex, Xing, Li

Mining dense subgraphs where vertices connect closely with each other is a common task when analyzing graphs. A very popular notion in subgraph analysis is core decomposition. Recently, Esfahani et al. presented a probabilistic core decomposition algorithm based on graph peeling and Central Limit Theorem (CLT) that is capable of handling very large graphs. Their proposed peeling algorithm (PA) starts from the lowest degree vertices and recursively deletes these vertices, assigning core numbers, and updating the degree of neighbour vertices until it reached the maximum core. However, in many applications, particularly in biology, more valuable information can be obtained from dense sub-communities and we are not interested in small cores where vertices do not interact much with others. To make the previous PA focus more on dense subgraphs, we propose a multi-stage graph peeling algorithm (M-PA) that has a two-stage data screening procedure added before the previous PA. After removing vertices from the graph based on the user-defined thresholds, we can reduce the graph complexity largely and without affecting the vertices in subgraphs that we are interested in. We show that M-PA is more efficient than the previous PA and with the properly set filtering threshold, can produce very similar if not identical dense subgraphs to the previous PA (in terms of graph density and clustering coefficient).

artificial intelligence, social media, vertex, (18 more...)

2108.06094

Country:

North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.15)
North America > Canada > Saskatchewan > Saskatoon (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (0.88)
Information Technology > Communications > Social Media (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

arXiv.org Machine LearningJul-30-2020

Handling high correlations in the feature gene selection using Single-Cell RNA sequencing data

Xing, Li, Joun, Songwan, Mackey, Kurt, Lesperance, Mary, Zhang, Xuekui

Motivation: Selecting feature genes and predicting cells' phenotype are typical tasks in the analysis of scRNA-seq data. Many algorithms were developed for these tasks, but high correlations among genes create challenges specifically in scRNA-seq analysis, which are not well addressed. Highly correlated genes lead to unreliable prediction models due to technical problems, such as multi-collinearity. Most importantly, when a causal gene (whose variants have a true biological effect on the phenotype) is highly correlated with other genes, most algorithms select one of them in a data-driven manner. The correlation structure among genes could change substantially. Hence, it is critical to build a prediction model based on causal genes. Results: To address the issues discussed above, we propose a grouping algorithm that can be integrated into prediction models. Using real benchmark scRNA-seq data and simulated cell phenotypes, we show our novel method significantly outperforms standard models in both prediction and feature selection. Our algorithm reports the whole group of correlated genes, allowing researchers to either use their common pattern as a more robust predictor or conduct follow-up studies to identify the causal genes in the group. Availability: An R package is being developed and will be available on the Comprehensive R Archive Network (CRAN) when the paper is published.

correlation, immunology, oncology, (23 more...)

2007.02455

Country: North America > Canada (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningJan-29-2019

Simultaneous prediction of multiple outcomes using revised stacking algorithms

Xing, Li, Lesperance, Mary, Zhang, Xuekui

Motivation: HIV is difficult to treat because its virus mutates at a high rate and mutated viruses easily develop resistance to existing drugs. If the relationships between mutations and drug resistances can be determined from historical data, patients can be provided personalized treatment according to their own mutation information. The HIV Drug Resistance Database was built to investigate the relationships. Our goal is to build a model using data in this database, which simultaneously predicts the resistance of multiple drugs using mutation information from sequences of viruses for any new patient. Results: We propose two variations of a stacking algorithm which borrow information among multiple prediction tasks to improve multivariate prediction performance. The most attractive feature of our proposed methods is the flexibility with which complex multivariate prediction models can be constructed using any univariate prediction models. Using cross-validation studies, we show that our proposed methods outperform other popular multivariate prediction methods. Availability: An R package will be made available.

algorithm, immunology, neural network, (21 more...)

1901.10153

Country:

North America (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.96)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.91)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

arXiv.org Machine LearningJan-29-2019

Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies

Xu, Yan, Xing, Li, Su, Jessica, Zhang, Xuekui, Qiu, Weiliang

Genome-wide association studies (GWASs) aim to detect genetic risk factors for complex human diseases by identifying disease-associated single-nucleotide polymorphisms (SNPs). The traditional SNP-wise approach along with multiple testing adjustment is over-conservative and lack of power in many GWASs. In this article, we proposed a model-based clustering method that transforms the challenging high-dimension-small-sample-size problem to low-dimension-large-sample-size problem and borrows information across SNPs by grouping SNPs into three clusters. We pre-specify the patterns of clusters by minor allele frequencies of SNPs between cases and controls, and enforce the patterns with prior distributions. In the simulation studies our proposed novel model outperform traditional SNP-wise approach by showing better controls of false discovery rate (FDR) and higher sensitivity. We re-analyzed two real studies to identifying SNPs associated with severe bortezomib-induced peripheral neuropathy (BiPN) in patients with multiple myeloma (MM). The original analysis in the literature failed to identify SNPs after FDR adjustment. Our proposed method not only detected the reported SNPs after FDR adjustment but also discovered a novel BiPN-associated SNP rs4351714 that has been reported to be related to MM in another study.

health & medicine, oncology, snp, (19 more...)

1806.08456

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)