Definitions
The definition of a variant is based on the definition of each allele with respect to the reference sequence. We consider 5 major types loosely decribed as follows.
- 1. SNP
- The reference and alternate sequences are of length 1 and the base nucleotide is different from one another.
- 2. MNP
- The reference and alternate sequences are of the same length and have to be greater than 1 and all nucleotides in the sequences differ from one another.
- OR
- All reference and alternate sequences have the same length (this is applicable to all alleles).
- 3. INDEL
- The reference and alternate sequences are not of the same length.
- 4. CLUMPED
- A clumping of nearby SNPs, MNPs or Indels.
- 5. SV
- The alternate sequence is represented by an angled bracket tag.
Classification Procedure
- Trim each allele with respect to the reference sequence individually
- Inspect length, defined as length of alternate allele minus length of reference allele.
- if length = 0
- if length(ref) = 1 and nucleotides differ, classify as SNP (count ts and tv too)
- if length(ref) > 1
- if all nucleotides differ, classify as MNP (count ts and tv too)
- if not all nucleotides differ, classify as CLUMPED (count ts and tv too)
- if length 0, classify as INDEL
- if shorter allele is of length 1
- if shorter allele does not match either of the end nucleotides of the longer allele, add SNP classification
- if shorter allele length > 1
- compare the shorter allele sequence with the subsequence in the 5' end of the longer allele (count ts and tv too)
- if all nucleotides differ, add MNP classification
- if not all nucleotides differ, add CLUMPED classification
- compare the shorter allele sequence with the subsequence in the 5' end of the longer allele (count ts and tv too)
- if shorter allele is of length 1
- if length = 0
- Variant classification is the union of the classifications of each allele present in the variant.
- If all alleles are the same length, add MNP MNP classification.
출처 : http://genome.sph.umich.edu/wiki/Variant_classification
'Informatics > Genome Informatics' 카테고리의 다른 글
Python / Strand에 따라 Genome Sequence 바꾸기 (0) | 2015.10.20 |
---|---|
BEDTOOLS / VCF 파일 내 타겟 BED 영역 추출하기 (0) | 2015.10.20 |
VCF 파일 간단히 추출해보기 (Linux cut 명령어 이용) (0) | 2015.08.04 |
FASTQ SAMPLE (0) | 2014.06.19 |
Genome / 유전적 복합성(Genetic Heterogeneity) (0) | 2011.12.16 |