This page uses content from Wikipedia and is licensed under CC BY-SA.
The upper DNA molecule differs from the lower DNA molecule at a single base-pair location (a C/A polymorphism)
A single-nucleotide polymorphism (SNP; /snɪp/; plural /snɪps/) is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of more than 1% in the population.
For example, at a specific base position in the human genome, the C nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – C or A – are said to be alleles for this position.
A single-nucleotide variant (SNV) is a variation in a single nucleotide without any limitations of frequency and may arise in somatic cells. A somatic single-nucleotide variation (e.g., caused by cancer) may also be called a single-nucleotide alteration.
SNPs in the coding region are of two types: synonymous and nonsynonymous SNPs. Synonymous SNPs do not affect the protein sequence, while nonsynonymous SNPs change the amino acid sequence of protein. The nonsynonymous SNPs are of two types: missense and nonsense.
SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding, messenger RNA degradation, or the sequence of noncoding RNA. Gene expression affected by this type of SNP is referred to as an eSNP (expression SNP) and may be upstream or downstream from the gene.
A tag SNP is a representative single-nucleotide polymorphism in a region of the genome with high linkage disequilibrium (the non-random association of alleles at two or more loci). Tag SNPs are useful in whole-genome SNP association studies, in which hundreds of thousands of SNPs across the entire genome are genotyped.
Haplotype mapping: sets of alleles or DNA sequences can be clustered so that a single SNP can identify many linked SNPs.
Linkage disequilibrium (LD), a term used in population genetics, indicates non-random association of alleles at two or more loci, not necessarily on the same chromosome. It refers to the phenomenon that SNP allele or DNA sequence that are close together in the genome tend to be inherited together. LD is affected by two parameters: 1) The distance between the SNPs [the larger the distance, the lower the LD]. 2) Recombination rate [the lower the recombination rate, the higher the LD].
More than 335 million SNPs have been found across humans from multiple populations. A typical genome differs from the reference human genome at 4 to 5 million sites, most of which (more than 99.9%) consist of SNPs and short indels.
Within a genome
The genomic distribution of SNPs is not homogenous; SNPs occur in non-coding regions more frequently than in coding regions or, in general, where natural selection is acting and "fixing" the allele (eliminating other variants) of the SNP that constitutes the most favorable genetic adaptation. Other factors, like genetic recombination and mutation rate, can also determine SNP density.
SNP density can be predicted by the presence of microsatellites: AT microsatellites in particular are potent predictors of SNP density, with long (AT)(n) repeat tracts tending to be found in regions of significantly reduced SNP density and low GC content.
Within a population
There are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. Within a population, SNPs can be assigned a minor allele frequency — the lowest allele frequency at a locus that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms.
SNPs' greatest importance in clinical research is for comparing regions of the genome between cohorts (such as with matched cohorts with and without a disease) in genome-wide association studies. SNPs have been used in genome-wide association studies as high-resolution markers in gene mapping related to diseases or normal traits. SNPs without an observable impact on the phenotype (so called silent mutations) are still useful as genetic markers in genome-wide association studies, because of their quantity and the stable inheritance over generations.
SNPs were used initially for matching a forensic DNA sample to a suspect but it has been phased out with development of STR-based DNA fingerprinting techniques. Current next-generation-sequencing (NGS) techniques may allow for better use of SNP genotyping in a forensic application so long as problematic loci are avoided. In the future SNPs may be used in forensics for some phenotypic clues like eye color, hair color, ethnicity, etc. Kidd et al. have demonstrated that a panel of 19 SNPs can identify the ethnic group with good probability of match (Pm = 10^-7) in 40 population groups studied. One example of how this might potentially be useful is in the area of artistic reconstruction of possible premortem appearances of skeletonized remains of unknown individuals. Although a facial reconstruction can be fairly accurate based strictly upon anthropological features, other data that might allow a more accurate representation include eye color, skin color, hair color, etc.
In a situation with a low amount of forensic sample or a degraded sample, SNP methods can be a good alternative to STR methods due to the abundance of potential markers, amenability to automation, and potential reduction of required fragment length to only 60-80 bp. In the absence of a STR match in DNA profile database; different SNPs can be used to get clues regarding ethnicity, phenotype, lineage, and even identity.
A single SNP may cause a Mendelian disease, though for complex diseases, SNPs do not usually function individually, rather, they work in coordination with other SNPs to manifest a disease condition as has been seen in Osteoporosis. One of the earliest successes in this field was finding a single base mutation in the non-coding region of the APOC3 (apolipoprotein C3 gene) that associated with higher risks of hypertriglyceridemia and atherosclerosis.
All types of SNPs can have an observable phenotype or can result in disease:
SNPs in non-coding regions can manifest in a higher risk of cancer, and may affect mRNA structure and disease susceptibility. Non-coding SNPs can also alter the level of expression of a gene, as an eQTL (expression quantitative trait locus).
synonymous substitutions by definition do not result in a change of amino acid in the protein, but still can affect its function in other ways. An example would be a seemingly silent mutation in the multidrug resistance gene 1 (MDR1), which codes for a cellular membrane pump that expels drugs from the cell, can slow down translation and allow the peptide chain to fold into an unusual conformation, causing the mutant pump to be less functional (in MDR1 protein e.g. C1236T polymorphism changes a GGC codon to GGT at amino acid position 412 of the polypeptide (both encode glycine) and the C3435T polymorphism changes ATC to ATT at position 1145 (both encode isoleucine)).
missense – single change in the base results in change in amino acid of protein and its malfunction which leads to disease (e.g. c.1580G>T SNP in LMNA gene – position 1580 (nt) in the DNA sequence (CGT codon) causing the guanine to be replaced with the thymine, yielding CTT codon in the DNA sequence, results at the protein level in the replacement of the arginine by the leucine in the position 527, at the phenotype level this manifests in overlapping mandibuloacral dysplasia and progeria syndrome)
The International SNP Map working group mapped the sequence flanking each SNP by alignment to the genomic sequence of large-insert clones in Genebank. These alignments were converted to chromosomal coordinates that is shown in Table 1. This list has greatly increased since, with, for instance, the Kaviar database now listing 162 million single nucleotide variants (SNVs).
The nomenclature for SNPs can be confusing: several variations can exist for an individual SNP, and consensus has not yet been achieved.
The rs### standard is that which has been adopted by dbSNP and uses the prefix "rs", for "reference SNP", followed by a unique and arbitrary number. SNPs are frequently referred to by their dbSNP rs number, as in the examples above.
The Human Genome Variation Society (HGVS) uses a standard which conveys more information about the SNP. Examples are:
c.76A>T: "c." for coding region, followed by a number for the position of the nucleotide, followed by a one-letter abbreviation for the nucleotide (A, C, G, T or U), followed by a greater than sign (">") to indicate substitution, followed by the abbreviation of the nucleotide which replaces the former
p.Ser123Arg: "p." for protein, followed by a three-letter abbreviation for the amino acid, followed by a number for the position of the amino acid, followed by the abbreviation of the amino acid which replaces the former.
SNPs are usually biallelic and thus easily assayed. Analytical methods to discover novel SNPs and detect known SNPs include:
An important group of SNPs are those that corresponds to missense mutations causing amino acid change on protein level. Point mutation of particular residue can have different effect on protein function (from no effect to complete disruption its function). Usually, change in amino acids with similar size and physico-chemical properties (e.g. substitution from leucine to valine) has mild effect, and opposite. Similarly, if SNP disrupts secondary structure elements (e.g. substitution to proline in alpha helix region) such mutation usually may affect whole protein structure and function. Using those simple and many other machine learning derived rules a group of programs for the prediction of SNP effect was developed:
SIFT This program provides insight into how a laboratory induced missense or nonsynonymous mutation will affect protein function based on physical properties of the amino acid and sequence homology.
LIST (Local Identity and Shared Taxa) estimates the potential deleteriousness of mutations resulted from altering their protein functions. It is based on the assumption that variations observed in species closely related to human are more significant when assessing conservation compared to those in distantly related species.
^Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (March 2008). "Natural selection has driven population differentiation in modern humans". Nature Genetics. 40 (3): 340–5. doi:10.1038/ng.78. PMID18246066.
^Varela MA, Amos W (March 2010). "Heterogeneous distribution of SNPs in the human genome: microsatellites as predictors of nucleotide diversity and divergence". Genomics. 95 (3): 151–9. doi:10.1016/j.ygeno.2009.12.003. PMID20026267.
^Lee CR (July–August 2004). "CYP2C9 genotype as a predictor of drug disposition in humans". Methods and Findings in Experimental and Clinical Pharmacology. 26 (6): 463–72. PMID15349140.
^Yanase K, Tsukahara S, Mitsuhashi J, Sugimoto Y (March 2006). "Functional SNPs of the breast cancer resistance protein-therapeutic effects and inhibitor development". Cancer Letters. 234 (1): 73–80. doi:10.1016/j.canlet.2005.04.039. PMID16303243.
^Fareed M, Afzal M (April 2013). "Single-nucleotide polymorphism in genome-wide association of human population: A tool for broad spectrum service". Egyptian Journal of Medical Human Genetics. 14 (2): 123–134. doi:10.1016/j.ejmhg.2012.08.001.
^Singh M, Singh P, Juneja PK, Singh S, Kaur T (March 2011). "SNP-SNP interactions within APOE gene influence plasma lipids in postmenopausal osteoporosis". Rheumatology International. 31 (3): 421–3. doi:10.1007/s00296-010-1449-7. PMID20340021.
^Rees A, Shoulders CC, Stocks J, Galton DJ, Baralle FE (February 1983). "DNA polymorphism adjacent to human apoprotein A-1 gene: relation to hypertriglyceridaemia". Lancet. 1 (8322): 444–6. doi:10.1016/S0140-6736(83)91440-X. PMID6131168.
^Giegling I, Hartmann AM, Möller HJ, Rujescu D (November 2006). "Anger- and aggression-related traits are associated with polymorphisms in the 5-HT-2A gene". Journal of Affective Disorders. 96 (1–2): 75–81. doi:10.1016/j.jad.2006.05.016. PMID16814396.
^Morita A, Nakayama T, Doba N, Hinohara S, Mizutani T, Soma M (June 2007). "Genotyping of triallelic SNPs using TaqMan PCR". Molecular and Cellular Probes. 21 (3): 171–6. doi:10.1016/j.mcp.2006.10.005. PMID17161935.
^Prodi DA, Drayna D, Forabosco P, Palmas MA, Maestrale GB, Piras D, Pirastu M, Angius A (October 2004). "Bitter taste study in a sardinian genetic isolate supports the association of phenylthiocarbamide sensitivity to the TAS2R38 bitter receptor gene". Chemical Senses. 29 (8): 697–702. doi:10.1093/chemse/bjh074. PMID15466815.
^National Center for Biotechnology Information, United States National Library of Medicine. 2014. NCBI dbSNP build 142 for human. "Archived copy". Archived from the original on 2017-09-10. Retrieved 2017-09-11.CS1 maint: archived copy as title (link)
^National Center for Biotechnology Information, United States National Library of Medicine. 2015. NCBI dbSNP build 144 for human. Summary Page. "Archived copy". Archived from the original on 2017-09-10. Retrieved 2017-09-11.CS1 maint: archived copy as title (link)
^Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D (February 2001). "A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms". Nature. 409 (6822): 928–33. doi:10.1038/35057149. PMID11237013.
^Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES (September 2000). "An SNP map of the human genome generated by reduced representation shotgun sequencing". Nature. 407 (6803): 513–6. doi:10.1038/35035083. PMID11029002.
^Drabovich AP, Krylov SN (March 2006). "Identification of base pairs in single-nucleotide polymorphisms by MutS protein-mediated capillary electrophoresis". Analytical Chemistry. 78 (6): 2035–8. doi:10.1021/ac0520386. PMID16536443.
^Griffin TJ, Smith LM (July 2000). "Genetic identification by mass spectrometric analysis of single-nucleotide polymorphisms: ternary encoding of genotypes". Analytical Chemistry. 72 (14): 3298–302. doi:10.1021/ac991390e. PMID10939403.