상세 보기
- Seo, Seongjun;
- Tandon, Anshula;
- Ngoc Nguyen, Thi Bich;
- Nhung Vu, Thi Hong;
- Lee, Keun Woo;
- ... Park, Sung Ha
WEB OF SCIENCE
0SCOPUS
0초록
Efficient and scalable DNA-based data storage requires encoding strategies that balance sequence compactness, stability, and fidelity. In this study, we present a randomized DNA base sequence design framework, designated RN-B#, which incorporates degenerate bases to significantly enhance information density and minimize sequence redundancy. By implementing rule-based encoding systems with varying constraints on homopolymer length and degenerate base positioning (e.g., R infinity-B32, R2-B52, and R0-B16), we demonstrate the tunability of encoding properties such as GC balance, homopolymer suppression, and sequencing fidelity. Experimental validation using black-white binary image data encoded with RN-B# rules confirmed successful image recovery via Sanger sequencing, with an average sequence identity of up to 75%. Furthermore, we developed probabilistic models to quantify the sequencing accuracy as a function of sequencing depth and degenerate base complexity and corroborated them by in silico analysis. Our approach achieved a maximum theoretical information density of 3.91 bits/nt, offering a versatile platform for robust, high-capacity DNA data storage by leveraging the combinatorial space of degenerate nucleotide codes.
키워드
- 제목
- Randomized DNA Base Sequence Design by Using Degenerate Bases for DNA Data Storage
- 저자
- Seo, Seongjun; Tandon, Anshula; Ngoc Nguyen, Thi Bich; Nhung Vu, Thi Hong; Lee, Keun Woo; Park, Sung Ha
- 발행일
- 2025-11-17
- 유형
- Article
- 권
- 8
- 호
- 11
- 페이지
- 10360 ~ 10370