Definitions

The definition of a variant is based on the definition of each allele with respect to the reference sequence. We consider 5 major types loosely decribed as follows.

1. SNP
The reference and alternate sequences are of length 1 and the base nucleotide is different from one another.
2. MNP
The reference and alternate sequences are of the same length and have to be greater than 1 and all nucleotides in the sequences differ from one another.
OR
All reference and alternate sequences have the same length (this is applicable to all alleles).
3. INDEL
The reference and alternate sequences are not of the same length.
4. CLUMPED
A clumping of nearby SNPs, MNPs or Indels.
5. SV
The alternate sequence is represented by an angled bracket tag.

Classification Procedure

  1. Trim each allele with respect to the reference sequence individually
  2. Inspect length, defined as length of alternate allele minus length of reference allele.
    1. if length = 0
      1. if length(ref) = 1 and nucleotides differ, classify as SNP (count ts and tv too)
      2. if length(ref) > 1
        1. if all nucleotides differ, classify as MNP (count ts and tv too)
        2. if not all nucleotides differ, classify as CLUMPED (count ts and tv too)
    2. if length \ne 0, classify as INDEL
      1. if shorter allele is of length 1
        1. if shorter allele does not match either of the end nucleotides of the longer allele, add SNP classification
      2. if shorter allele length > 1
        1. compare the shorter allele sequence with the subsequence in the 5' end of the longer allele (count ts and tv too)
          1. if all nucleotides differ, add MNP classification
          2. if not all nucleotides differ, add CLUMPED classification
  3. Variant classification is the union of the classifications of each allele present in the variant.
  4. If all alleles are the same length, add MNP MNP classification.


출처 : http://genome.sph.umich.edu/wiki/Variant_classification

+ Recent posts