Nt1310 Unit 2 Lab Report

638 Words3 Pages

For most sequences at position 4 and 5 we observe only the nucleotides G and T, respectively. There may be rare cases where other nucleotides may also be found. To consider such observations, we need to do a process called additive smoothing or Laplace smoothing to smooth the categorical data. [9] In this case, we add 4 sequences: AAAAAAAAA, CCCCCCCCC, GGGGGGGG, TTTTTTTTT. These sequences would give us a pseudocount of 1 at each position called the Laplace pseudocount. fA,1 = (3+1)/(10+4) fC,1 = (3+1)/(10+4) fG,1 = (3+1)/(10+4) fT,1 = (3+1)/(10+4) Updating the matrix given in Figure 4, we obtain the new position weight matrix calculated. It is given as a table below. Table 1: Corresponding PWM for given sequences with Laplace pseudocounts Nucleotide …show more content…

Nucleotide 1 2 3 4 5 6 7 8 9 A 0.286 0.357 0.143 0.071 0.071 0.500 0.643 0.071 0.143 C 0.357 0.143 0.214 0.071 0.071 0.071 0.071 0.143 0.143 G 0.143 0.214 0.500 0.786 0.071 0.143 0.214 0.714 0.214 T 0.214 0.286 0.143 0.071 0.786 0.071 0.071 0.071 0.500 3.3 Log-odds

More about Nt1310 Unit 2 Lab Report

Open Document