Gene Finding via TATA box search

Problem

• Within a long region of genomic sequence, genes are also characterised by having a the sequence “TATA” somewhere near the beginning of the string.
• Write a program to prompt the user for a string of DNA bases (ACTG)
• Search the string for the substring “TATA”
• If found, report the 0-based position of the first match. Otherwise report not found.

A contextually-related problem is Gene Finding via GC content.

Solution

/*
Test Case 1:
Input: TATA (example of string that is exactly what we're looking for)
Expected Output: 0
Actual Output: 0 (was 1 before adjusted i in cout statement)

Test Case 2:
Input: GATATA (example of string that contains what we're looking for)
Expected Output: 2
Actual Output: 2 (was 3 before adjusted i in cout statement)

Test Case 3:
Input: A (example of string that doesn't contain subsequence TATA)

Other example Test Case inputs:  GTATAG (TATA is in the middle), TAT (Almost TATA),TATAGCTATA (TATA appears twice) etc.
*/

#include <iostream>
#include <string>

using namespace std;

int main()
{
// Inputs: DNA sequence
// Outputs: First 0-based position of "TATA" found in input or "Not Found"

// Define subsequence to find
string subseq_to_find = "TATA";

// Prompt user for sequence
cout << "DNA seq please: ";
string dna_seq;
cin >> dna_seq;

// Use a variable that will keep track whether or not we have found TATA yet
bool found = false;

int i = 0;

// As long as we haven't found TATA and as long as we haven't checked every position in the input sequence
while (!found && i < dna_seq.length())
{
// check if a substring starting the current position is the same as the subsequence we're looking for
if (subseq_to_find == dna_seq.substr(i, subseq_to_find.length()))
{
// if it is, then we say we've found it
found = true; // how exciting!
}
// be sure to increment the current position for the next time through the loop.
i++;
}

if (found)
{
cout << subseq_to_find << " was found at position " << i - 1 << endl;
}
else
{
}