This page uses content from Wikipedia and is licensed under CC BY-SA.
Retrotransposons (also called transposons via RNA intermediates) are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. These DNA sequences use a "copy-and-paste" mechanism, whereby they are first transcribed into RNA, then converted back into identical DNA sequences using reverse transcription, and these sequences are then inserted into the genome at target sites.
Retrotransposons are particularly abundant in plants, where they are often a principal component of nuclear DNA. In maize, 49–78% of the genome is made up of retrotransposons. In wheat, about 90% of the genome consists of repeated sequences and 68% of transposable elements. In mammals, almost half the genome (45% to 48%) is transposons or remnants of transposons. Around 42% of the human genome is made up of retrotransposons, while DNA transposons account for about 2–3%.
The retrotransposons' replicative mode of transposition by means of an RNA intermediate rapidly increases the copy numbers of elements and thereby can increase genome size. Like DNA transposable elements (class II transposons), retrotransposons can induce mutations by inserting near or within genes. Furthermore, retrotransposon-induced mutations are relatively stable, because the sequence at the insertion site is retained as they transpose via the replication mechanism.
Retrotransposons copy themselves to RNA and then back to DNA that may integrate back to the genome. The second step of forming DNA may be carried out by a reverse transcriptase, which the retrotransposon encodes. Transposition and survival of retrotransposons within the host genome are possibly regulated both by retrotransposon- and host-encoded factors, to avoid deleterious effects on host and retrotransposon as well. The understanding of how retrotransposons and their hosts' genomes have co-evolved mechanisms to regulate transposition, insertion specificities, and mutational outcomes in order to optimize each other's survival is still in its infancy.
Because of accumulated mutations, most retrotransposons are no longer able to retrotranspose.
Retrotransposons, also known as class I transposable elements, consist of two subclasses, the long terminal repeat (LTR-retrotransposons) and the non-LTR retrotransposons. Classification into these subclasses is based on the phylogeny of the reverse transcriptase, which goes in line with structural differences, such as presence/absence of long terminal repeats as well as number and types of open reading frames, encoding domains and target site duplication lengths.
LTR retrotransposons have direct LTRs that range from ~100 bp to over 5 kb in size. LTR retrotransposons are further sub-classified into the Ty1-copia-like (Pseudoviridae), Ty3-gypsy-like (Metaviridae), and BEL-Pao-like groups based on both their degree of sequence similarity and the order of encoded gene products. Ty1-copia and Ty3-gypsy groups of retrotransposons are commonly found in high copy number (up to a few million copies per haploid nucleus) in animals, fungi, protista, and plants genomes. BEL-Pao like elements have so far only been found in animals.
Although retroviruses are often classified separately, they share many features with LTR retrotransposons. A major difference with Ty1-copia and Ty3-gypsy retrotransposons is that retroviruses have an envelope protein (ENV). A retrovirus can be transformed into an LTR retrotransposon through inactivation or deletion of the domains that enable extracellular mobility. If such a retrovirus infects and subsequently inserts itself in the genome in germ line cells, it may become transmitted vertically and become an Endogenous Retrovirus (ERV). Endogenous retroviruses make up about 8% of the human genome and approximately 10% of the mouse genome.
In plant genomes, LTR retrotransposons are the major repetitive sequence class, e.g. able to constitute more than 75% of the maize genome.
Endogenous retroviruses are an important type of LTR retrotransposon in mammals, including in humans where the Human ERVs make up 8% of the genome.
Non-LTR retrotransposons consist of two sub-types, long interspersed elements (LINEs) and short interspersed elements (SINEs). They can also be found in high copy numbers, as shown in the plant species. Non-long terminal repeat (LTR) retroposons are widespread in eukaryotic genomes. LINEs possess two ORFs, which encode all the functions needed for retrotransposition. These functions include reverse transcriptase and endonuclease activities, in addition to a nucleic acid-binding property needed to form a ribonucleoprotein particle. SINEs, on the other hand, co-opt the LINE machinery and function as nonautonomous retroelements. While historically viewed as "junk DNA", recent research suggests that, in some rare cases, both LINEs and SINEs were incorporated into novel genes so as to evolve new functionality.
Long INterspersed Elements (LINE) are a group of genetic elements that are found in large numbers in eukaryotic genomes, comprising 17% of the human genome (99.9% of which is no longer capable of retrotransposition, and therefore considered "dead" or inactive). Among the LINE, there are several subgroups, such as L1, L2 and L3. Human coding L1 begin with an untranslated region (UTR) that includes an RNA polymerase II promoter, two non-overlapping open reading frames (ORF1 and ORF2), and ends with another UTR. Recently, a new open reading frame in the 5' end of the LINE elements has been identified in the reverse strand. It is shown to be transcribed and endogenous proteins are observed. The name ORF0 is coined due to its position with respect to ORF1 and ORF2. ORF1 encodes an RNA binding protein and ORF2 encodes a protein having an endonuclease (e.g. RNase H) as well as a reverse transcriptase. The reverse transcriptase has a higher specificity for the LINE RNA than other RNA, and makes a DNA copy of the RNA that can be integrated into the genome at a new site. The endonuclease encoded by non-LTR retroposons may be AP (Apurinic/Pyrimidinic) type or REL (Restriction Endonuclease Like) type. Elements in the R2 group have REL type endonuclease, which shows site specificity in insertion.
The 5' UTR contains the promoter sequence, while the 3' UTR contains a polyadenylation signal (AATAAA) and a poly-A tail. Because LINEs (and other class I transposons, e.g. LTR retrotransposons and SINEs) move by copying themselves (instead of moving by a cut and paste like mechanism, as class II transposons do), they enlarge the genome. The human genome, for example, contains about 500,000 LINEs, which is roughly 17% of the genome. Of these, approximately 7,000 are full-length, a small subset of which are capable of retrotransposition.
SINEs are the only TEs that are non- autonomous by nature, meaning that they did not evolve from autonomous elements. They are small (80- 500 bases)) and rely in trans on functional LINEs for their replication, but their evolutionary origin is very distinct. SINEs can be found in very diverse eukaryotes, but they have only accumulated to impressive amount in mammals, where they represent between 5 and 15% of the genome with millions of copies.
SINEs typically possess a “head” with an RNA pol III promoter that enables autonomous transcription, and a body of various composition. SINEs are postulated to originate from the accidental retrotransposition of various RNA pol III transcripts, and have appeared separately numerous times in evolution history. The type of RNA pol III promoter defines the different superfamilies and reveal their origin: tRNA, 5S ribosomal RNA or signal recognition particle 7SL RNA.
SINEs do not encode a functional reverse transcriptase protein and rely on other mobile elements for the transposition, especially LINEs. SINE RNAs form a complex with LINE ORF2 proteins and are inserted into the genome by target primed reverse transcription, creating short TSDs upon insertion. Some SINE families are thought to rely on specific LINEs for their replication, while others seem to be more generalist.
Alu and B1 elements, with their 1.1 million and 650,000 copies in the human and mouse genomes, respectively, harbor a 7SL promoter. The 350,000 copies of B2 SINEs in the mouse are on the other hand tRNA-related.
Alu and B1 elements, with their 1.1 million and 650,000 copies in the human and mouse genomes, respectively, harbor a 7SL promoter.
The 350,000 copies of B2 SINEs in the mouse are on the other hand tRNA- related.
The most common SINE in primates is Alu. Alu elements are approximately 350 base pairs long, do not contain any coding sequences, and can be recognized by the restriction enzyme AluI (hence the name). The distribution of these elements has been implicated in some genetic diseases and cancers.
Hominid genomes contain also original elements termed SVA. They are composite transposons formed by the fusion of a SINE-R and an Alu, separated by a variable number of tandems repeats. Less than 3kb in length and apparently mobilized using LINE1 machinery, they are around 2500-3000 copies in human or gorilla genomes, and less than 1000 in orangutan. SVA are one of the youngest transposable element in great apes genome and among the most active and polymorphic in the human population.