Randomized DNA Base Sequence Design by Using Degenerate Bases for DNA Data Storage
  • Seo, Seongjun
  • Tandon, Anshula
  • Ngoc Nguyen, Thi Bich
  • Nhung Vu, Thi Hong
  • Lee, Keun Woo
  • ... Park, Sung Ha
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Efficient and scalable DNA-based data storage requires encoding strategies that balance sequence compactness, stability, and fidelity. In this study, we present a randomized DNA base sequence design framework, designated RN-B#, which incorporates degenerate bases to significantly enhance information density and minimize sequence redundancy. By implementing rule-based encoding systems with varying constraints on homopolymer length and degenerate base positioning (e.g., R infinity-B32, R2-B52, and R0-B16), we demonstrate the tunability of encoding properties such as GC balance, homopolymer suppression, and sequencing fidelity. Experimental validation using black-white binary image data encoded with RN-B# rules confirmed successful image recovery via Sanger sequencing, with an average sequence identity of up to 75%. Furthermore, we developed probabilistic models to quantify the sequencing accuracy as a function of sequencing depth and degenerate base complexity and corroborated them by in silico analysis. Our approach achieved a maximum theoretical information density of 3.91 bits/nt, offering a versatile platform for robust, high-capacity DNA data storage by leveraging the combinatorial space of degenerate nucleotide codes.

키워드

DNArandomizationsequence designdegenerate basedata storageINFORMATION-STORAGEDIGITAL INFORMATIONROBUST
제목
Randomized DNA Base Sequence Design by Using Degenerate Bases for DNA Data Storage
저자
Seo, SeongjunTandon, AnshulaNgoc Nguyen, Thi BichNhung Vu, Thi HongLee, Keun WooPark, Sung Ha
DOI
10.1021/acsabm.5c01680
발행일
2025-11-17
유형
Article
저널명
ACS Applied Bio Materials
8
11
페이지
10360 ~ 10370