A Tri-Character Guided Exact String Matching Algorithm for Efficient STR Detection in Forensic DNA Analysis
Abstract
The importance of string matching algorithms in the world of modern DNA forensic technology cannot be over-stated. Short Tandem Repeats (STRs) play an important role in forensic DNA analysis due to their high variability among individuals. Fast and accurate detection of STRs in large-scale genomic data is most important for criminal investigations, identity verification, and population studies. This study introduces a novel exact string matching algorithm, the Tri-Scan for Left, Right, and Middle Character (TSLRMC) approach, tailored for efficient pattern detection in forensic DNA sequences. This study focusses on the limitations found in some famous and widely used exact string matching algorithms. Proposed algorithm improves the running time for scanning pattern string in a long text string. The novelty of proposed algorithm is to optimize the scanning by the scanning of pattern string left, right and middle characters in the long DNA sequence string and then scan the remaining character of the pattern string in that partial text window where the pattern string’s left, right and middle character found. The proposed algorithm shows significant improvement compared with the most popular exact string matching algorithms, based on running time as well as number of characters compared. Time complexity of this proposed novel TSLRMC algorithm is O (n-m) in worst case, O (mn) in average case and O (1) in best case.