ADRS-CNet: An adaptive models of dimensionality reduction methods for DNA storage clustering algorithms

Aug-22-2024–arXiv.org Artificial Intelligence

In the downstream information retrieval process of DNA storage technology, specific hybridization techniques, such as Polymerase Chain Reaction (PCR) or magnetic bead separation, are commonly used to access data [1]. However, this technology faces several challenges, including high base error rates (insertions, deletions, substitutions, etc.) and the loss of storage sequences, which pose significant threats to the reliability of stored data [2]. To address these issues, clustering and alignment of sequencing data can be employed. A commonly used feature extraction method is based on k-mer frequency matrices, where the dimensionality of the extracted features increases exponentially with the value of k [3] [4] [5]. Therefore, selecting an appropriate dimensionality reduction technique becomes a critical challenge that needs to be addressed. This study aims to develop an adaptive classification model to identify the optimal dimensionality reduction method, thereby mitigating the curse of dimensionality caused by k-mer feature extraction and enhancing the effectiveness of K-means clustering in restoring the original sequence information. Specifically, among the numerous available algorithms, Principal Component Analysis (PCA) [6], t-distributed Stochastic Neighbor Embedding (t-SNE) [7], and Uniform Manifold Approximation and Projection (UMAP) [8] are particularly prominent in the fields of cell biology, bioinformatics, and data visualization [9]. This study addresses the challenge of selecting the appropriate dimensionality reduction method to mitigate the curse of dimensionality in K-means clustering.

accuracy, algorithm, dimensionality reduction method, (13 more...)

arXiv.org Artificial Intelligence

Aug-22-2024

arXiv.org PDF

Add feedback

Country:
- South America > Peru
  - Cusco Department > Cusco Province > Cusco (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.14)
    - Hampshire > Southampton (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning
      - Dimensionality Reduction (1.00)
      - Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found