Patient
- vascular disease 와 early sudden death에 대한 가족력을 가지고 있음.
- electrocardiography(심전도 기록), echocardiogram(초음파 심장 진단도), cardiopulmonary exercise test(심폐 운동 테스트)를 받았음.
Genome analysis
- 전혈 2mL로부터 genomic DNA를 정제하였음.
-
Heliscope genome sequencer를 통해 Sequence를 하였음.
- IndexDP를 통해 reference human genome build 36 에 mapping 하였다. -> Base calling은 UMKA algorithm을 사용
- A subset of SNP calls는 Illumina BeadArray로 Validation.
Disease and risk analysis
-
Focus on
- variants associated with genes for mendelian disease
- novel mutations
- variants known to modulate response to pharmacotherapy
- single nucleotide polymorphisms previously associated with complex disease.
-
Known rare variants와 Novel variants 정의
- Known rare variants : GVS에서 5% 미만의 빈도를 나타내는 variants
- Novel variants : RefSNP에 number가 없는 variants.
-
Searching for coding variants using for analysis tools.
-
Developing algorithms to index variants affecting or creating start sites, stop sites, splice sites, and microRNAs
- SIFT
- PolyPhen
- UniProt
- PolyDoms
-
Manual Curation was performed for each novel non-synonymous coding variant associated with a known or suspected disease gene, as found in Online Mendelian Inheritance in Man.
- Manual search of individual international LSMD, curated dbSNP and OMIM entries, HGMD, PubMed
- Evaluate the potential effect of novel changes on predicted protein structure: Using SIFT, PolyPhen
- miRNA: Searching for rare variants by queries matching splice site and miRNA databases to patient variants by chromosome location.
-
Pharmacogenomics
- The PharmGKB curators reviewed the drug-related variant annotations for their clinical relevance to this patient.
- A level of evidence was assigned to each variant annotation based on a clinician's appraisal of the impact of the variant.
Disease risk
-
Disease-associated SNP databse
-
To analyse the disease risk across the spectrum of human disease, a high-quality disease-associated SNP database was built.
-
Starting with a list of all SNPs in dbSNP that were measured in the HapMap 2&3 projects.
- searching for rsIDs from within the abstracts of all papers in MEDLINE(인간과 관련 없는 것 삭제)
-
2,671 papers were manually curated & db generated
- full-text paper the disease name
- specific phenotype(e.g. acute coronary syndrome in coronary artery disease)
- study populiation(e.g. Finnish)
- case & control population (e.g. 2508 patients with angiographically proven coronary artery disease)
- genotyping technology
- major/minor/risk alleles
- odds ratio
- 95% confidence interval of the odds ratio
- published pvalue
- genetic model for each included
- statistically significant genotype comparison
- including those involving any non-HapMap SNPs within these papers
-
if not present from the search-based sample
- disease associations ere recorded from the full text of all papers referenced in the HGMD(Professional version), NHGRI GWAS catalog
- Separating into two categories: case/control studies, cohort studies
-
-
-
Categorizing studies on similar disease and phenotypes
- To enable the integration of multiple studies on similar diseases and phenotypes, the disease/phenotype names in our association database were mapped to the UMLS.
- 55,258 identified associateds were categorized into 813 different disease and phenotypes for 9,649 specific dbSNP IDs.
-
Identifying strand direction in association studies
- An algorithm to correctly identify the strand direction automatically was developed to compare study reported major/minor alleles as found in the most similar population to the patient available in HapMap data.
- 6,196 record were identified that reported genotypes in the negative strand
-
Calculating likehood ratio of disease risk for SNP genotypes
- limited to the set of case/control studies
-
For every disease SNP, we calculated the LR for each genotype using the following equation.
- LR = probability of the genotype in the case population / probability of the genotype in the control population
- retrieved 4,137 LR for 735 SNPs on 141 diseases, curated from a total of 480 publications.
-
Pre-test probability for diseases
- Pre-test probabilities of lifetime risk of disease were calculated for a wide range of conditions for a person matching the patient's characteristics using a combination of sources.
-
Calculating post-test probability of disease risk for the patient
-
For each of 121 diseases, the post-test probability of developting disease for the patient was calculated as follows.
- For SNPs with multiple LR from multiple studies, the mean LR was calculated, weighted by the square root of sample sizes.(Figure 4.)
- Each haplotype block, the highest LR SNP was used.
- LR from all SNPs were multiplied to report the cumulative LR for the patient.
- Post-test probabilities were calculable for 55 diseases.
-
-
Calculating the odds ratios of disease risk
- Calculation of the LR of disease risk requires the frequency of three genotypes in the case and control populations.
- The disease risks of the patient were calculated using the odds ratio as extracted from the literature.
- Study authores were contacted when reported associations and genotype frequencies were discordant from population frequencies.
-
Gene-environment interaction and conditionally dependent risk
-
Using Etiome
- links between diseases and known aetiological factors was obtained as previously described.
-