NEURODNAAI: Neural pipeline approaches for the advancing dna-based information storage as a sustainable digital medium using deep learning framework

Thakur, Rakesh, Singh, Lavanya, Yashika, null, Bundawala, Manomay, Kumar, Aruna

arXiv.org Artificial Intelligence 

DNA is a promising medium for digital information storage for its exceptional density and durability. While prior studies advanced coding theory, workflow design, and simulation tools, challenges such as synthesis costs, sequencing errors, and biological constraints (GC-content imbalance, homopolymers) limit practical deployment. To address this, our framework draws from quantum parallelism concepts to enhance encoding diversity and resilience, integrating biologically informed constraints with deep learning to enhance error mitigation in DNA storage. Our results show that traditional prompting or rule-based schemes fail to adapt effectively to realistic noise, whereas NeuroDNAAI achieves superior accuracy. Experiments on benchmark datasets demonstrate low bit error rates for both text and images. By unifying theory, workflow, and simulation into one pipeline, NeuroDNAAI enables scalable, biologically valid archival DNA storage. The rapid increase in global data generation has placed unprecedented pressure on traditional storage media, including magnetic tapes, hard disks, and solid-state drives. These technologies are constrained in terms of density, durability, and sustainability, often degrading within decades and necessitating frequent migration. At the same time, forecasts indicate that the volume of digital data will soon surpass the capacity of existing storage infrastructure, creating an urgent demand for alternative paradigms. DNA has emerged as a promising medium for information storage due to its extremely high density, long-term stability, and universal biological accessibility. Despite this extraordinary theoretical potential, practical adoption remains hindered by challenges in synthesis, sequencing, and error correction. Errors such as substitutions, insertions, and deletions complicate reliable retrieval, thereby motivating the development of novel methods capable of tolerating or correcting these distortions. In response to these challenges, the present work proposes a modular end-to-end framework that simulates the DNA storage pipeline and introduces a Transformer-based neural decoder for robust data reconstruction. Within this system, digital information (in this case, MNIST images) is encoded into DNA sequences, passed through a configurable noise model that simulates synthesis and sequencing errors, and subsequently reconstructed using an encoder-decoder architecture.