Method Breif.

Patient
  • vascular disease 와 early sudden death에 대한 가족력을 가지고 있음.
  • electrocardiography(심전도 기록),  echocardiogram(초음파 심장 진단도), cardiopulmonary exercise test(심폐 운동 테스트)를 받았음.

 

Genome analysis
  • 전혈 2mL로부터 genomic DNA를 정제하였음.
  • Heliscope genome sequencer를 통해 Sequence를 하였음.

    • IndexDP를 통해 reference human genome build 36 에 mapping 하였다. -> Base calling은 UMKA algorithm을 사용
    • A subset of SNP calls는 Illumina BeadArray로 Validation.

 

Disease and risk analysis
  • Focus on

    • variants associated with genes for mendelian disease
    • novel mutations
    • variants known to modulate response to pharmacotherapy
    • single nucleotide polymorphisms previously associated with complex disease.
  • Known rare variants와 Novel variants 정의

    • Known rare variants : GVS에서 5% 미만의 빈도를 나타내는 variants
    • Novel variants : RefSNP에 number가 없는 variants.
  • Searching for coding variants using for analysis tools.

    • Developing algorithms to index variants affecting or creating start sites, stop sites, splice sites, and microRNAs

      • SIFT
      • PolyPhen
      • UniProt
      • PolyDoms
    • Manual Curation was performed for each novel non-synonymous coding variant associated with a known or suspected disease gene, as found in Online Mendelian Inheritance in Man.

      • Manual search of individual international LSMD, curated dbSNP and OMIM entries, HGMD, PubMed
      • Evaluate the potential effect of novel changes on predicted protein structure: Using SIFT, PolyPhen
    • miRNA: Searching for rare variants by queries matching splice site and miRNA databases to patient variants by chromosome location.

 

Pharmacogenomics
  • The PharmGKB curators reviewed the drug-related variant annotations for their clinical relevance to this patient.
  • A level of evidence was assigned to each variant annotation based on a clinician's appraisal of the impact of the variant.

 

Disease risk
  • Disease-associated SNP databse

    • To analyse the disease risk across the spectrum of human disease, a high-quality disease-associated SNP database was built.

      • Starting with a list of all SNPs in dbSNP that were measured in the HapMap 2&3 projects.

        • searching for rsIDs from within the abstracts of all papers in MEDLINE(인간과 관련 없는 것 삭제)
        • 2,671 papers were manually curated & db generated

          • full-text paper the disease name
          • specific phenotype(e.g. acute coronary syndrome in coronary artery disease)
          • study populiation(e.g. Finnish)
          • case & control population (e.g. 2508 patients with angiographically proven coronary artery disease)
          • genotyping technology
          • major/minor/risk alleles
          • odds ratio
          • 95% confidence interval of the odds ratio
          • published pvalue
          • genetic model for each included
          • statistically significant genotype comparison
          • including those involving any non-HapMap SNPs within these papers
        • if not present from the search-based sample

          • disease associations ere recorded from the full text of all papers referenced in the HGMD(Professional version), NHGRI GWAS catalog
      • Separating into two categories: case/control studies, cohort studies
  • Categorizing studies on similar disease and phenotypes

    • To enable the integration of multiple studies on similar diseases and phenotypes, the disease/phenotype names in our association database were mapped to the UMLS.
    • 55,258 identified associateds were categorized into 813 different disease and phenotypes for 9,649 specific dbSNP IDs.
  • Identifying strand direction in association studies

    • An algorithm to correctly identify the strand direction automatically was developed to compare study reported major/minor alleles as found in the most similar population to the patient available in HapMap data.
    • 6,196 record were identified that reported genotypes in the negative strand
  • Calculating likehood ratio of disease risk for SNP genotypes

    • limited to the set of case/control studies
    • For every disease SNP, we calculated the LR for each genotype using the following equation.

      • LR = probability of the genotype in the case population / probability of the genotype in the control population
      • retrieved 4,137 LR for 735 SNPs on 141 diseases, curated from a total of 480 publications.
  • Pre-test probability for diseases

    • Pre-test probabilities of lifetime risk of disease were calculated for a wide range of conditions for a person matching the patient's characteristics using a combination of sources.
  • Calculating post-test probability of disease risk for the patient

    • For each of 121 diseases, the post-test probability of developting disease for the patient was calculated as follows.

      • For SNPs with multiple LR from multiple studies, the mean LR was calculated, weighted by the square root of sample sizes.(Figure 4.)
      • Each haplotype block, the highest LR SNP was used.
      • LR from all SNPs were multiplied to report the cumulative LR for the patient.
      • Post-test probabilities were calculable for 55 diseases.
  • Calculating the odds ratios of disease risk

    • Calculation of the LR of disease risk requires the frequency of three genotypes in the case and control populations.
    • The disease risks of the patient were calculated using the odds ratio as extracted from the literature.
    • Study authores were contacted when reported associations and genotype frequencies were discordant from population frequencies.
  • Gene-environment interaction and conditionally dependent risk

    • Using Etiome

      • links between diseases and known aetiological factors was obtained as previously described.

+ Recent posts