General features and properties
of insertion sequence elements


                1. Organization
2. Control of neighboring gene expression
3. Control of transposition activity
4. Host factors
5. Reaction mechanisms
                6. Transposition reactions and different types of gene rearrangement.
7. Transposition immunity
8. Target specificity
9. Population dynamics and horizontal transfer

   

Organization

      In addition to being small, insertion sequences are genetically compact ( Fig 1). They generally encode no functions other than those involved in their mobility although individual members of several families which include additional genes are now being identified. IS-encoded functions include factors required in cis, in particular recombinationally active DNA sequences which define the ends of the element together with an enzyme, the transposase (Tpase), which recognises and processes these ends. The Tpase is generally encoded by a single or perhaps two, open reading frames and consumes nearly the entire length of the element.

1.      Terminal inverted repeats
With several notable exceptions (the
IS91, IS110 and IS200/605 families; Table) the majority of ISs exhibit short terminal inverted repeat sequences (IR) of between 10 and 40 bp. In those cases examined experimentally, the IRs can be divided into two functional domains ( Fig). Domain A includes the two or three terminal base pairs, and is involved in the cleavages and strand transfer reactions leading to transposition of the element. Domain B is positioned within the IR and is involved in Tpase binding (113, 116, 211, 214, 229, 315, 529). A similar organisation has also been proposed for the transposon Tn3 (216). The simple single terminal Tpase binding sites of ISs are to be contrasted with the multiple and asymmetric protein binding sites observed in the case of bacteriophage Mu (96) and transposons Tn7 (94) and probably Tn552 (334). Multiple protein binding sites are also a characteristic of the complex En/Spm and Ac elements of maize (see 159, 265) ( Fig 2). It is worth noting that members of the IS21 family also carry multiple repeated sequences at both ends which may also represent Tpase binding sites (33, 310). By accommodating different binding patterns at each end, such an arrangement can provide a functional distinction between the ends either in the assembly or in the activity of the synaptic complex. In addition, indigenous IS promoters are often located partially within the IR sequence upstream of the Tpase gene, by convention IRL. This arrangement may provide a mechanism for autoregulation of Tpase synthesis by Tpase binding. Binding sites for host specified proteins are also often found within or close to the terminal IRs and these proteins may play a role in modulating transposition activity or Tpase expression.

 

2.      Domain structure of transposases
A general pattern for the functional organisation of Tpases appears to be emerging from the limited number which have been analysed. Many can be divided into topologically distinct structural domains and, although several regions of the protein may contribute to a given function, the isolated domains themselves often exhibit a distinct function.The sequence-specific DNA binding activities of the proteins are generally located in the N-terminal region while the catalytic domain is often localised towards the C-terminal end (IS1 (
303), (528); IS30 (452); Mu, see (276); Tn3 (307); IS50 (512); IS903: (469); IS911 (379) for review see 192) ( Fig 3). One functional interpretation of this arrangement for prokaryotic elements is that it may permit interaction of a nascent protein molecule with its target sequences on the IS thus coupling expression and activity. This notion is reinforced by the observation that the presence of the C-terminal region of both the IS50 and IS10 Tpases appears to mask the DNA binding domain and reduce binding activity (502), (225) possibly by masking the DNA binding domain. This arrangement might favor activity of the protein in cis, a property shared by several Tpases (see Activity in cis below). Similar masking appears to occur with the IS1 (D. Zerbib and M.C., unpublished) and the IS911 (191, 192, 351) Tpases. In several cases these domains are assembled into a single protein from consecutive orfs by translational frameshifting (see Programmed translational frameshifting below).
An additional characteristic of some, if not all, Tpases is the capacity to generate multimeric forms essential for their activity (see 192). This is true of both prokaryotic elements such as bacteriophage Mu (see
73), IS50 (502) and IS911 (191, 192, 351) (but apparently not IS10 (47)) and of eukaryotic elements such as the retroviruses (see 237) and the mariner-like element, Mos1 (294). With the results of an increasing number of structural studies of these types of enzyme, it will be of great interest to compare the overall similarities of equivalent functional domains as has been recently possible with the catalytic domains of retroviral integrases ( Fig 4), Mu transposase and other polynucleotidyl transferases such as the Holiday resolvase, RuvC and RnaseH (see 405 and 169).

 

3.      Direct target repeats
Another general feature of IS elements is that, on insertion, most generate short directly repeated sequences (DR) of the target DNA flanking the IS. Attack of each DNA strand at the target site by one of the two transposon ends in a staggered way during insertion provides an explanation for this observation. The length of the DR, between 2 and 14 bp, is characteristic for a given element and a given element will generally generate a duplication of fixed length. However, certain ISs have been shown to generate DRs of atypical length at a low frequency, presumably reflecting small variations in the geometry of the transposition complex (see
149 and references therein for further discussion). Although some notable exceptions exist in which there is a systematic absence of DRs (either within a given family or in several independent transposition events of a given element), care should be taken in interpreting the absence of DRs in isolated cases. Note that in certain cases ISs without DRs can simply result from homologous inter- or intra-molecular recombination between two IS elements, each with a different DR sequence, or from the formation of adjacent deletions resulting from duplicative intramolecular transposition (see for example 482, 356). Three ISs, IS1549, IS1634 and IS1630, have been identified which appear to generate long DRs of quite variable length (65 , 370, 494). Two, IS1549 and IS1634, are distantly related to the IS4 family and one, IS1630, belongs to the IS30 family. The mechanism involved in generating such long DRs is at present unknown. A lack of DRs can simply result from homologous inter- or intra-molecular recombination between two IS elements, each with a different DR sequence. This would result in a hybrid element carrying one DR of each parent. It can also arise from the formation of adjacent deletions resulting from duplicative intramolecular transposition. In this case, a single copy of the DR is located on each of the reciprocal deletion products (see for example 501 and 482).

Control of neighboring gene expression  

Many IS elements have been shown to activate the expression of neighboring genes. Initially this was observed for IS1, IS2, and IS5 (see 149), but other elements such as: IS406 (426), IS1186 (371), IS481 (118), IS928B (295), ISSg1 (111), IS1490 (209), and ISVa1 (475) have also been shown to exhibit similar properties. Many other examples can be found in the literature. Activation can be due to formation of hybrid promoters when insertion of an IS results in placing an outwardly directed -35 promoter hexamer located in the terminal IRs (see 149) at the correct distance from a resident -10 hexamer. Such -35 elements have been observed experimentally in many ISs (IS1: 381; IS2: 463; IS21: 398; IS30: 97; IS257: 281, 442; IS911: 477; IS982: 295). Activation can also occur by endogenous transcription "escaping" the IS and traversing the terminal IR (e.g. IS3: 82; IS10: 87; IS481: 118; IS982: 295).

In addition, an inward directed -10 hexamer has also been detected in the left terminal IR of several elements. When two ends of such an element are juxtaposed, by formation of head-to-tail dimers or of circular copies of the IS, the combination of the -10 hexamer with a -35 hexamer resident in the neighboring right end can generate relatively strong promoters (IS21, 398; IS30, 97 ; IS911, 477). This arrangement can lead to high Tpase expression and consequent increases in transposition activity (see IS3 family, IS21 family, IS30 family).

An additional type of control of neighboring genes is illustrated by the (normally cryptic) bgl operon of E. coli. Activation of the operon can be accomplished in several ways, one of which is by insertion of either IS1 or IS5 upstream or downstream of the promoter (399; 400). Although a detailed explanation of the effect is not available, it has been suggested that activation involves changes in DNA structure (e.g. changes in curvature or topology) since mutations in the cap, topI, and hns genes have a similar activation effect. For IS5, activation is abolished by internal deletions leaving only 25 bp of IRL and 32 bp of IRR, but is restored by providing an IS5-encoded gene product, Ins5A, necessary for transposition, in trans (424). The implication of these results is that interaction of Ins5A with the IS5 ends in some way changes the topology of the bgl promoter region. At present no other examples of such control mechanisms are available.

Control of transposition activity

Transposition activity is generally maintained at a low level. An often cited reason for this is that high activities and the accompanying mutagenic effect of genome rearrangements would be detrimental to the host cell (see 123). Endogenous transposase promoters, in contrast to those assembled by juxtaposition of -10 and -35 hexamers (see above), are generally weak and many are partially located in the terminal IRs. This would enable their autoregulation by Tpase binding.

1.      Transposase expression and activity
While many of the classical mechanisms of controlling gene expression, such as the production of transcriptional repressors (IS1:
303, 528, 131 ; IS2, 202) or translational inhibitors (anti-sense RNA in the case of IS10; see (251) are known to operate in Tpase expression, several other mechanisms have also been uncovered.

(i)  Impinging transcription        

 Many ISs have evolved mechanisms which attenuate their activation by impinging transcription following insertion into active host genes.

         Sequestration of translation initiation signals
 One such mechanism observed with IS10 and IS50, and potentially present in several other ISs, is the sequestering of translation initiation signals in an RNA secondary structure ( Fig 5). These ISs carry inverted repeat sequences located close to the left end which include the ribosome binding site or translation initiation codon for the Tpase gene.Transcripts from the resident Tpase promoter include only the distal repeat unit which is unable to form the secondary structure, while transcripts from neighboring DNA include both repeats and would generate secondary structures in the mRNA which would sequester translation initiation signals (103
, 255) This has been demonstrated experimentally for IS10 and IS50 but several additional insertion sequences carry such potential structures and might be expected to exhibit a similar mechanism (see 403).

         Disruption of Tpase-end complexes ?
 In several cases simple transcription across the end of the element has been observed to reduce transposition activity (
162, see 149). This effect may be the result of disrupting complexes between Tpase and cognate ends. Transposition of both IS1 and IS50 was shown to be sensitive to transcription traversing their ends although other elements have not to our knowledge been examined (see 149). In the case of bacteriophage Mu, transcription originating from within the element and impinging on the left end has also been shown to reduce activity (162). It is possible that transcription disrupts the formation of intermediates including transposase and one or both Mu ends which lead to stable transposition complexes.

(ii) Programmed translational frameshifting
A second mechanism acts at the level of translation elongation and involves programmed translational frameshifting between two consecutive open reading frames ( Fig). Typically a -1 frameshift is observed in which the translating ribosome slides one base upstream and resumes in the alternative phase. This generally occurs at the position of so-called slippery codons in a heptanucleotide sequence of the type Y YYX XXZ in phase 0 (where the bases paired with the anticodon are shown as triplets) which is read as YYY XXX Z in the shifted -1 phase (see e.g.
79, 133, 155, http://recode.genetics.utah.edu/). The sequence A AAA AAG is a common example of this type of heptanucleotide. Ribosomal shifting of this type is stimulated by structures in the mRNA which tend to impede the progression of the ribosome such as potential ribosome binding sites upstream or secondary structures (stem-loop structures and potential pseudoknots) downstream of the slippery codons (134). Translational control of transposition by frameshifting has been demonstrated both for IS1 (431, 297, 131) and for members of the IS3 family (377; see also 79 ) but may also occur in several other IS elements (see for example IS5 family, below). For IS1 and members of the IS3 family, the upstream frame appears to carry a DNA recognition domain whereas the downstream frame encodes the catalytic site. While the product of the upstream frame alone acts as a modulator of activity, presumably by binding to the IR sequences, frameshifting assembles both domains into a single protein, the Tpase, which directs the cleavages and strand transfer necessary for mobility of the element. The frameshifting frequency is thus critical in determining overall transposition activity. Although it has yet to be explored in detail, frameshifting could be influenced by host physiology thus coupling transposition activity to the state of the host cell.

(iii) Translation termination
A third potential mechanism derives from the observation that the translation termination codon of Tpase genes of certain elements is located within their IR sequences. Although, to our knowledge, no extensive analysis of the significance of this arrangement has yet been undertaken, it seems possible that it may in some manner couple translation termination, transposase binding and transposition activity. The transposase gene of several elements does not possess a termination codon. These include IS240C, a member of the
IS6 family  (Y.Chen and J.M. unpublished), two members of the IS5 family, IS427 (108) and ISMk1 (316), and various members of the IS630 family including IS870 and ISRf1 (143). Instead, some of these elements insert into a relatively specific target sequence in which the target DR produced on insertion itself generates the Tpase termination codon (see: IS630 family). The relevance of this as a control mechanism has yet to be explored.

 (v) Transposase stability
Transposase stability can also contribute to control of transposition activity. The Tpase of IS903 is sensitive to the E. coli Lon protease (116). This sensitivity limits the activity of the Tpase both temporally and spatially and may provide an explanation for the observation that several Tpases function preferentially in cis (see below). Indeed mutant IS903 Tpase derivatives have been isolated which exhibit an increased capacity to function in trans. These are more refractory to Lon degradation than the wildtype protein (113). Some evidence that Lon may also be involved in regulating Tn5 (IS50) transposition has also been presented (254). An observation which might also reflect Tpase instability is the temperature sensitive nature of IS1-mediated adjacent deletions in vivo (393), of Tn3 transposition (257) and of IS911 intramolecular recombination both in vivo and in vitro (189). For IS911, incubation of the Tpase at 42°C results in an irreversible loss in activity.

(vi) Activity in cis
Early studies on several transposable elements indicated that transposition activity was more efficient if the transposase is provided by the element itself or by a transposase gene located close by on the same DNA molecule. This preferential activity in cis reduces the probability that transposase expression from a given element will activate transposition of related copies elsewhere in the genome. The effect can be of several orders of magnitude and has been observed for a variety of elements including IS1 (
306, 380), IS10 (335), IS50 (217), and IS903 (167, 168). Its magnitude is characteristic for a given IS. This property presumably reflects a facility of the cognate transposases to bind to transposon ends close to their point of synthesis and is likely to be the product of several phenomena.

In the case of IS903, increased stability (116) and expression (113) have been shown to increase the capacity for transposase activity in trans. Likewise, for IS10, mutations which increase translation of the transposase also decrease the cis preference of the enzyme and it has been suggested, moreover, that cis preference is strongly dependent upon the half-life of the transposase message and the rate at which transcripts are released from their templates (225).

An additional consideration which may promote preferential activity in cis is reflected in the N-terminal location of the DNA binding domain in many Tpases. If the N-terminal domain is capable of folding independantly of the catalytic domain, this arrangement would permit preferential binding of nascent Tpase polypeptides to neighboring binding sites ( Fig 6) (see Domain structure of transposases above). For several Tpases, the N-terminal portion of the protein exhibits a higher affinity for the ends than does the entire Tpase molecule, suggesting that the C-terminal end may in some way mask the DNA binding activity of the N-terminal portion. It remains to be seen whether this is a general property of Tpases.

Host factors

Transposition activity is frequently modulated by various host factors. These effects are generally specific for each element. A non-exhaustive list of such factors includes the DNA chaperones (or histone-like proteins), IHF, HU, HNS, and FIS, the replication initiator DnaA, the protein chaperone/proteases ClpX, P, and A, the SOS control protein LexA, and the Dam DNA methylase. In addition, proteins which govern DNA supercoiling in the cell might also influence transposition.

The DNA chaperones may play roles in assuring the correct three dimensional architecture in the evolution of various nucleoprotein complexes necessary for productive transposition. They may also be involved in regulating Tpase expression. IHF, HU, HNS, and FIS have all been variously implicated in the case of bacteriophage Mu, in the control of Mu gene expression or directly in the transposition process (see 74 for review). Several elements carry specific binding sites for IHF within, or close to, their terminal IRs. These can lie within (e.g. IS1:150 ; IS903: see 166)  or close to (IS10: 251) the Tpase promoter. IHF appears to influence the nature of IS10 transposition products by binding to a site 43 bp from one end (441, 76, 418). It also stimulates Tpase binding to the ends of the Tn3 family member, Tn1000 or gd (508). Ironically, although IS1 was the first element in which IHF sites were identified (one within each IR), conditions have not yet been found in which IHF shows a clear effect on transposition or gene expression (D. Zerbib and M.C. unpublished results). In the case of IS50, an element of the same family as IS10, both the protein Fis and the replication initiator protein DnaA have been reported to intervene in transposition (see 402). Finally another "histone-like" protein, HNS, has been reported to stimulate transposition of IS1 in certain circumstances (440).

Although their mode of action is at present unknown, several other host proteins with otherwise entirely different functions have been implicated in transposition. Acyl carrier protein (ACP) was independently shown to stimulate 3' end cleavage of Tn3 by its cognate Tpase (308a) and, together with ribosomal protein L29, to greatly increase binding of TnsD (a protein involved in Tn7 target selection) to the chromosomal insertion site, attTn7 (437). Moreover ACP and L29 moderately stimulate Tn7 transposition in vitro while L29 alone has a significant stimulatory effect in vivo (437). The mode of action of these proteins may be similar to that of the accessory proteins PepA and ArgR which modify the architecture of the synaptic complex in certain XerC/XerD-mediated site-specific recombination reactions (185).

Certain factors involved in protein "management" such as ClpX, ClpP, and Lon have been implicated in transposition. ClpX is essential for Mu growth (329) where it is required for disassembling the transposase-DNA complex or the transpososome strand transfer complex in preparation for the assembly of a replication complex (258, 285). Recognition of Mu transposase, pA, by ClpX requires the terminal 10 amino acids of pA (286). Together with ClpP, ClpX also plays a role in proteolysis of the Mu repressor (267, 506). As indicated above, the Lon protease is implicated in proteolysis of the IS903 transposase (113, 116). At present the involvement of these proteins in the transposition of other elements has not been well documented.

The third class of host factor includes host cell systems which act to limit DNA damage and maintain chromosome integrity. Studies with IS10 (see 251) and IS1 (274) have demonstrated that high levels of Tpase in the presence of suitable terminal IRs lead to the induction of the host SOS system. As discussed previously (310), some controversy still exists concerning the role of RecA in Tn5 (IS50) transposition (1, 259, 260, 503). Reznikoff and colleagues have provided genetic evidence that transposition is inhibited by induction of the SOS system in a manner which does not require the proteolytic activity of RecA (504). On the other hand, Tessman and collaborators (259, 260, 261) using a different transposition assay have found that constitutive SOS conditions actually enhance Tn5 transposition. Moreover, using yet another assay system, Ahmed (6) has concluded that intermolecular transposition of Tn5 is stimulated in the presence of RecA. Further investigation is clearly required to understand these apparently incompatible results.

Ahmed has also concluded that intermolecular transposition of the IS1-based transposon, Tn9, behaves in a similar way to that of Tn5 with respect to the recA allele (6). In contrast, however, the frequency of adjacent deletions mediated by IS1 was significantly increased in the absence of RecA. This has received some independent support using a physical assay where it was shown that deletion products accumulate in a recA but not in a wildtype host. Moreover, like IS1 induction of the SOS system, accumulation of such adjacent deletions was dependent on recBC (Zablweska et al., unpublished observations).

The recBC genes are also implicated in the behavior of transposons such as Tn10 and Tn5 (297, 298) where they affect precise and imprecise excision in a process independent of transposition per se. This is more pronounced with composite transposons in which the component insertion sequences IS10 and IS50 are present as inverted repeats, and is stimulated when the transposon is carried by a transfer-proficient conjugative plasmid. It seems probable that such excisions occur by a process involving replication fork slippage (see 149, 346, 392 for further discussion).

Both DNA polymerase I (419, 462) and DNA gyrase (218, 456) are implicated in the transposition of Tn5. While the effect of gyrase may reflect a requirement for optimal levels of supercoiling, the role of PolI remains a matter of speculation. It may be involved in DNA synthesis necessary to repair the single strand gaps resulting from staggered cleavage of the target and which gives rise to the DRs. DNA gyrase has also been shown to be important in transposition of bacteriophage Mu (362).

Another host function, the Dam DNA methylase can be important in modulating both Tpase expression and activity. IS10, IS50 and IS903 all carry methylation sites (GATC) in the transposase promoter regions and in each case, promoter activity is increased in a dam- host (409, 521). Additional evidence has been presented that the methylation status of GATC sites within the terminal inverted repeats also modulates the activity of these ends (409). For IS50, this can now be understood in terms of steric interference in the transposase active site, as recently revealed by the determination of the crystal structure of a synpatic complex including its Tpase and a pair of precleaved transposon ends (102). Similar  methylation sites have been previously observed in IS3, IS4, and IS5. A survey of the elements included in the data base has shown that most groups or families contain members which have GATC sites within the first 50 bp of one or both extremities. The IS3, IS5 and IS256 families include the most members carrying such sites. Except for IS3 itself where strong stimulation of transposition has been observed in a dam- host (450), in most of these cases the biological relevance of these sites is unknown. Moreover, it should be pointed out that the probability that any 100 bp DNA sequence carries the GATC tetranucleotide is about 40%. The role of Dam methylation in IS10 and IS50 transposition is described in detail in the appropriate sections dealing with these elements.

Reaction mechanisms

The reaction mechanisms involved in transposition has been treated in depth (e.g. 330, 331). The process can be divided into several defined steps generally comprising: binding of the recombinase to the ends; elaboration of a synaptic complex involving the recombinase, perhaps accessory proteins, and both transposon ends - this step involves either concomitant or subsequent (depending on the element) recruitment of the target DNA; cleavage and strand transfer of the transposon ends into the target; and processing of the strand transfer complex to a final product. The protein-DNA complexes assembled during this process have been called transpososomes.

The DDE catalytic site
Over the last few years, it has become clear that many of the enzymes involved in transposition reactions are related and, moreover, are part of a larger family of phosphoryltransferases which also includes RNaseH and the RuvC "Holliday junction resolvase"(405, 406). These transposases catalyse cleavage at the 3' ends of the element by an attacking nucleophile (generally H2O) to expose a free 3'OH group ( Fig 7). This hydroxyl in turn acts as a nucleophile in the attack of a 5' phosphate group in the target DNA (strand transfer) in a single-step transesterification reaction. Under certain conditions the enzyme is also capable of "disintegrating" the transposon end by catalysing the attack of the 3' target OH group on the new transposon-target junction (86, 379, 496). The reaction(s) do not require an external energy source neither do they appear to involve a covalently linked enzyme-substrate intermediate as do certain site-specific recombination reactions (134). Furthermore, it is worth underlining that, since it is the donor strand itself which performs the cleavage-ligation step in the target DNA, no cleaved target molecule is detected in the absence of strand transfer. An acidic amino acid triad (DDE : Asp, Asp, Glu) present in these enzymes is intimately involved in catalysis and its role is presumably in co-ordinating divalent metal cations (in particular Mg2+) implicated in assisting the various nucleophilic attacking groups and leaving groups during the course of the reaction ( Fig 7). The reaction is an in-line nucleophilic attack which results in chiral inversion of the target phosphate. Chiral inversion has been observed for retroviral integration (128a, 154a), bacteriophage Mu (331a) and Tn10 (245a) transposition and in a related reaction, V(D)J recombination (488) and was revealed by substituting a non-bridging oxygen for a sulphur group which renders fixes the normally achiral phosphate group in one or other of its alternative chiral forms (Fig 7).

For many ISs (the IS3 and IS6 families) and the retroviral integrases (IN) this triad is known as the DD(35)E motif and is highly conserved (135, 238, 262). In addition, alignments of several Tpases (404) revealed several regions of amino acid conservation designated, N1, N2, and N3 and C1 (309) which encompass the D (N2), D (N3) and E (C1) regions of typical DDE motifs respectively. The C1 region is probably the most defined structural element. It appears to be part of an a-helix (see below) and carries additional conserved amino acids which include a K or R residue approximately 7 amino acids or two helical turns downstream from the E residue (120, 374, 228). Less well-conserved residues often occur at approximately one helical turn (3 or 4 residues) upstream and downstream E (DDE motifs) . In the case of retroviruses this has been shown to interact with the terminal base pairs of the element presumably contributing to correct positioning of the transposon end in the active site (132 ,228). It is remarkable that such a motif can be found in many of the IS families defined here (DDE motifs). Although this conservation in the primary sequence is lower in certain of the other groups of elements and not all families have been explored in sufficient detail to assure that the alignments are biologically relevant, mutagenesis studies with some of these elements (e.g. the Mu, Tn7, IS10, and Tc1/3 Tpases and the retroviral integrases) clearly underline the importance of these residues. Moreover, structural analysis has shown that these acidic amino acids are arranged close to each other in a similar, three-dimensional manner in other phosphoryltransferases such as RNaseH and RuvC (405, 406) which are otherwise unrelated to Tpases.

These primary amino acid conservations are also reflected in conserved structural features. Major conserved features identified in the structures of the catalytic cores of retroviral integrases ( Fig), and the Mu and IS50 Tpases include 2 b-sheets each harboring one of the D residues and the long a-helix including the E residue. The a-helix is designated a-4 in HIV and C1 in the Tpases (161, 404, see 192). It is one of the most conserved regions in the catalytic core. Mutagenesis and crosslinking studies with INHIV suggest that it plays a role in positioning both the nucleophile and viral DNA (132, 154, 228). In particular K159 or K156, which cross-link to the terminal CA dinucleotide, are located on the same side of the a-helix and are strongly conserved. Q148 also lies on the same side and appears to interact with the terminal end of the nonprocessed strand. An amide or basic amino acid is highly conserved at this or the neighboring position and mutation results in severe impairment of catalysis (IN: 154; IS10: 47; IS50: 401 ; IS903: 469). Additional convergences in the structure-function relationships of the catalytic domains and their role in target sequestration and positioning will certainly be forthcoming. The structure of a post-cleavage synaptic complex of IS50 Tpase with its cognate ends, generally suports the model of DNA-protein interactions in the active site (102).

Transposition reactions and different types of gene rearrangement.

If DDE transposases are capable of catalysing only single strand cleavage to generate a 3’OH at the end of the transposon, how do IS elements with move from one place to another ( Fig A) ?. While initiation of a transposition reaction catalysed by the DDE transposases proceeds via transfer of the 3' end of the transposon (transferred strand), the outcome of the reaction is governed by cleavage of the 5' (non-transferred) end of the element .

Cleavage of the transferred strand alone.

If 5' strand cleavage does not occur concomitantly with 3' strand transfer ( Fig B), the donor and target molecules become covalently linked. Subsequent 5' strand cleavage will separate the element from the donor backbone and will also result in direct insertion ( Fig). In the case of retroviruses, only the 3' cleavage occurs removing 2 bp from the end of the double strand DNA viral copy. However, since no donor backbone is attached to the viral DNA, direct insertion can ensue. In cases where the 5’ transposon end remains attached to the donor backbone DNA the result of 3' strand transfer is to join transposon and target leaving a 3'OH in the target DNA at the junction. This can act as a primer for replication of the element and generate cointegrates where donor and target molecules are separated by a single transposon copy at each junction. During its lytic cycle bacteriophage Mu similarly undergoes only 3' cleavage of the transferred strand, the donor backbone remains attached and cointegrate molecules result if replication occurs. Members of the Tn3 family appear to transpose in this way and certain mutants of Tn7 and IS903 can be induced to undergo cointegrate formation. It is, however, important to note that cointegrates identical to those produced by replicative transposition can also be produced by a non-replicative process either from a plasmid dimer (( Fig)(29, 289) or from tandemly repeated copies of an IS element (250, 356, 394, 395, 450, 484).

Cleavage of both transferred and non-transferred strands.

If cleavage of the 5' end occurs concurrently with cleavage at the 3' end, the transposon is physically separated from its donor molecule. Strand transfer to a target then results in direct insertion of the element ( Fig). 5' cleavage can vary from element to element (483). Both 3' and 5' cleavages occur for IS10, IS50 and Tn7, and the eukaryotic elements Tc1/3 and mariner. These elements all undergo simple insertion.

Two Enzymes: Tn7

Tn7 has adopted a second enzyme to achieve double strand cleavage at the ends ( Fig C). One of the enzymes, TnsB, carries a DDE motif and is dedicated to cleavage at the 3’ end of the transposon.while a second protein, TnsA,  is dedicated to cleavage of the 5’ (non-transferred strand) (94). Double strand cleavage leaves a 5' three base pair overhang. Inactivation of the catalytic domain of TnsA prevents 5' strand cleavage and results in the formation of branched strand transfer intermediates in vitro and the production of cointegrates in vivo (322). Interestingly, while TnsB is presumed to be structurally similar to typical DDE Tpases, the structure of TnsA resembles that of a type II restriction endonuclease such as FokI (197). The use of two distinct strand-specific endonucleases is relatively rare and no IS is at present known to have adopted this type of transposition strategy.

Hairpin formation: IS10 and IS50

Double strand cleavage at the ends of IS10 and IS50 is flush and is promoted by the single Tpase protein ( Fig C). Here, the transferred transposon strand is cleaved first to liberate a 3'OH. However, instead of undergoing strand transfer to a target site, the exposed 3'OH group is directed to attack the opposite, complementary, DNA strand (trans-strand attack) entirely eliminating flanking donor DNA. This generates a double strand break at the donor ends and a hairpin structure at the transposon end (Fig C) releasing the element from its donor site. In a third step, the bridging phosphodiester bond undergoes hydrolysis to regenerate the terminal 3'OH which then completes strand transfer to a target DNA molecule. This type of pathway has been adopted in V(D)J recombination (488), as well as in IS10 (245) and IS50 (39). The geometry of precleaved DNA strands in the active site of the IS50 synaptic complex shows that the liberated 3' transposon end is positioned correctly for attacking the opposite strand (102).

Recessed 5’ ends: Tc1/3 and mariner

Double strand cleavage has also been demonstrated for the eukaryotic Tc1/3 and mariner elements ( Fig.). However, whereas cleavage occurs precisely at the 3' end of the transferred strand, cleavage at the 5' ends occurs two bases within the element for Tc1/3 (490), and preferentially 3 bases within the mariner elements Himar1 (272) and Mos1. It is interesting to note that for Tc1, Tc3, Mos1 and Himar1, 5' cleavage within the transposon occurs 3' to a relatively highly conserved CA dinucleotide (5'-CA-GTG for Tc1; 5'-CA-GTG for Tc3, see (385); 5'-CCA-GGT or 5'-TCA-GGT for Mos1; and 5'-ACA-GGT for Himar1). Moreover, early experiments with Tc1/3 indicated that cleavage of the 5' transposon end may occur before that of the 3' transferred strand (496). The cognate elements of many "classic" DD35E transposases are known to terminate with a 5'-CA-3' dinucleotide on the transferred strand and therefore cleave 3' to the terminal A. Rather than 5' cleavage of the non-transferred transposon strand, initial cleavage of the Tc and mariner elements could thus be viewed as 3' cleavage from the vector side. It is tempting to speculate that liberation of the 3' end of the transferred strand is effected by trans-strand attack by the 5'-CA-OH3'. This would generate a hairpin at the donor end, flanks with 2 (Tc) or 3 (mariner) bridging nucleotides derived from the transposon in a similar configuration to that observed in V(D)J recombination. It remains to be determined, however, whether it is also capable of correctly catalysing the predicted hairpin structures. The similarities between the Tc/mariner elements and certain bacterial ISs of the IS630 family have not yet been investigated at the level of the cleavage reactions.

IS circle formation: IS911 and the IS3 family

An interesting variation has been observed in the case of the IS3 family members IS2 , IS3 and IS911, a frequent product is a molecule in which only one transposon DNA strand is circularised ( Fig D). This results from a free 3'OH group generated at one transposon end by the Tpase using the opposite end as a target. These molecules appear to be processed into transposon circles by "resolving" the complementary strand and the circles can then undergo integration ( Fig). This type of reaction intermediate generated by sequence-specific recombination between two ends of an element may be generated with several other ISs including members of the IS21 (35),IS30 (250), IS110 (363), IS256 (300, 383) and ISL3 (232) families.

The spectrum of possible DNA rearrangements is probably even larger. A suggestion that certain Tpases may be capable of generating synapses between two ends on different molecules was originally proposed based on the results of a genetic analysis of Tn5 (289) and has more recently been demonstrated for IS10 in vitro (75). Intermolecular reactions of this type can also occur during insertion of IS911 and provide one element in target site selection (Loot and Chandler, unpublished results, see Target specificity below). Similar behavior as well as the capacity to act on directly repeated IS ends has recently been suggested for the IS1 Tpase in vivo (274). These types of event obviously extend the spectrum of possible DNA rearrangements.

Other chemistries
Variations and exceptions to this unifying mechanism will certainly emerge. Not all ISs exhibit a well defined DDE triad. For example, the Tpases of one group of elements, the IS91 family, show significant similarities with enzymes associated with replicons which use a rolling circle replication mechanism ( Fig). Indeed, present evidence (325) suggests that IS91 has adopted a rolling circle transposition mechanism similar to that proposed by Galas and Chandler (148) ( Fig). In addition, members of the IS110 family have been thought to encode a novel type of site-specific recombinase (284), although recent analysis has suggested the presence of an atypical DDE motif (474). The IS1 transposase has been reported to show limited similarity to phage l integrase (436), although this needs to be reevaluated in view of the discovery of additional IS1 variants (IS1 family). and active sites for the IS66, IS200/IS605, and ISAs1 familieshave yet to be defined.

 

Transposition immunity

One important property of some transposable elements is that of transposition immunity in which a target molecule already carrying a copy of an element exhibits a significantly reduced affinity for insertion of a second copy. At present, this phenomenon appears to be limited to the more complex transposons, bacteriophage Mu and Tn7, as well as to members of the Tn3 family. To our knowledge no insertion sequences have yet been clearly been demonstrated to adopt this strategy although some evidence concerning IS21 suggests that this element may show immunity (35 , 100,  Haas, D.personnal communication). It is interesting to note that elements which exhibit this capacity, including IS21, show a requirement for ATP. They also tend to carry multiple repeated sequences at each end (33, 35). A priori, this behavior would be inappropriate for elements involved in the formation of compound transposons in which two flanking copies of an IS element mobilise an interstitial DNA segment. .

Although perhaps not immediately relevant to insertion sequences per se, immunity seems a sufficiently important phenomenon in the field of transposition to merit a short overview. For bacteriophage Mu, transposition immunity is displayed by target DNA carrying Mu end sequences and is transmitted by the MuB protein. MuB plays a key role in target capture and strand transfer by binding DNA in a non-specific manner, providing a preferential target for the MuA transposase complexed with Mu ends, and stimulating Tpase activity (see 276). MuB displays an ATPase activity which is stimulated both by DNA and MuA (5). ATP, but not ATP hydrolysis, is necessary for MuB binding and for strand transfer (3). Interaction of MuB with MuA (bound to the immune target) provokes ATP hydrolysis with subsequent release of MuB and consequent reduction in the attractiveness of the DNA molecule as a target (4). This mechanism serves to redistribute MuB preferentially to DNA molecules which do not contain a MuA binding site.

A similar mechanism has been proposed for transposon Tn7 (19) where the presence of the right end of Tn7 renders the target immune (14). Here the transposase is composed of two Tn7 proteins, TnsA and TnsB. It acts in conjunction with TnsC which, like MuB, is a non-specific DNA binding protein with ATPase activity (see 94).

Although transposition immunity of Tn3 and the related Tn1000 (gd) is less well understood, it is known to require the presence of the 38 bp terminal IR on the immune target (17, 510). A major difference between Tn3 and the phage Mu and Tn7 systems is that only a single protein, the Tn3 transposase, TnpA, appears to be involved. As in these other two systems, immunity is mediated by Tpase binding to this end (10, 350, 510). Indeed, IHF, which stimulates Tpase binding to the IRs of Tn1000 (508), also increases immunity (509).

Target specificity

Where appropriate, insertion patterns of ISs are described in the sections dealing with the individual elements. Insertion specificity has also been treated in detail in a recent review (95). It is perhaps worthwhile, however, to summarise some of the more general issues concerning this aspect of transposition.

Target site selection differs significantly from element to element. Sequence-specific insertion is exhibited to some degree by several elements and varies considerably in stringency. It is strict in the case of one of the two Tn7 transposition pathways, where insertion occurs exclusively with high efficiency into a unique chromosomal site (attTn7) (94), and for IS91, which requires a GAAC/CAAG target sequence (326). Insertion sites are less strict but nevertheless sequence-specific for members of the IS630 and mariner/Tc families which both require a TA dinucleotide in the target, for IS10 which prefers (but is not restricted to) the symmetric 5'-NGCTNAGCN-3' heptanucleotide, for IS50 which transposes preferentially into 5'- AGNTYWRANCT-3' (164), for IS231 which shows a preference for 5'-GGG(N)5CCC-3' (184), and for bacteriophage Mu which shows a preference for 5'-NYG/CRN-3' (332). In the case of both IS10 and the Tc1/3 elements, sequences immediately adjacent to the consensus have also been shown to influence target choice (26, 368). A demonstration that IS10 Tpase directly influences target choice has been obtained by isolation of specific Tpase mutants which exhibit distinct alterations in the target sequence (25).

The choice of a target sequence by a given IS can be the result of a subtle combination of several sequence determinants as illustrated by IS903. IS903 generates a 9 bp DR. Although no significant target consensus is obvious to the eye, a statistical analysis suggested that it exhibits a 2 fold symmetry. A synthetic site containing the consensus sequence was shown to be an extraordinarily good target, attracting a very high proportion of all insertions. Mutant target sites confirmed the importance of these features as well as the importance of the dinucleotide which would occur at the transposon-target junction on insertion. The local DNA context of the target site was also shown to influence its strength (207).

Other elements exhibit regional preferences: for example GC or AT rich DNA segments (IS186: 434; IS1: 146 ,328, 526). Such regional specificity could reflect more global parameters such as local DNA structure. Indeed bent DNA has been evoked as a factor for target choice in retroviral (340), IS231 (184), IS4 and IS10 integration (182, 382). Recent results with Tn7 have shown that triple helix DNA strongly targets transposon insertion to a specific end of the triplex-duplex junction (389). This has been interpreted as being the result of recognition of an asymmetric distortion of the target DNA. Other factors which have also been implicated are: the degree of supercoiling (IS50: 293), replication (Tn7: 515; IS102: 36), transcription (IS102: 37 ;  Tn7: 106 ; Tn5/Tn10: 68), direction of conjugative transfer (IS903: 204) and protein-mediated targeting to, or exclusion from, transcriptional control regions (Mu: 498 ; yeast Ty1: 119 ; yeast Ty3: 518).

Although much information on target specificity has been obtained by analysing individual insertions, a more powerful approach is the use of population-based methods. Such methods provide a picture which is statistically more significant (340, 384). They have been applied in analysis of retroviral integration in vitro (340, 384), in analysing bacteriophage Mu insertion both in vitro (332) and in vivo (498), and in investigating IS1-mediated adjacent deletions (482). For retroviruses, this approach has revealed a preference for the exposed face of the nucleosome DNA helix and exclusion by DNA bound regulatory proteins. For phage Mu not only has it been used to establish the target consensus sequence in vitro (332) but it has also been exploited to investigate occupation of DNA binding sites in vitro (500) and in vivo (498). This technique, known as "Mu-printing", relies on the fact that the phage is excluded from insertion in regions of DNA to which regulatory proteins are bound (498).

Another phenomenon which may reflect insertion site specificity is the interdigitation of various intact or partial IS elements which has been noted repeatedly in the literature. Many of these observations are anecdotal and may reflect the scars of consecutive but isolated transposition events resulting from selection for acquisition (or loss) of accessory genes. Some indication of the statistical significance of this is expected to emerge from the many bacterial genome sequencing projects. On the other hand, several ISs have been shown to exhibit a true preference for insertion into other elements. A preferred target for IS231 is the terminal 38 bp of the transposon Tn4430 which includes both the sequence-specific and conformational components described above (184) while IS21 has been reported to show a preference for insertion close to the end of a second copy of the element located in the target plasmid (394) and similar results have been noted for IS30 (356) and for IS911 (Loot, C. and Chandler, M. unpublished results, 378). In these cases, the site-specific DNA binding properties of the Tpase are presumably implicated. At the mechanistic level, this phenomenon might be related to the capacity of IS10 Tpase to form synaptic complexes with ends located on separate DNA molecules (75).

Population dynamics and horizontal transfer

The distribution of many insertion sequences within and between various bacterial species has often been investigated as part of the initial characterisation of a new element, usually by simple Southern hybridisation. Although useful in "typing" strains, much of the data remains purely descriptive. However, few systematic attempts have been made to determine the dynamics of insertion sequences within bacterial populations in a controlled manner.

Hartl and colleagues (see 421 and 180) have determined the distribution of IS1, 2, 3, 4, 5, 30, and 103 in a heterogeneous collection of Escherichia coli strains (ECOR collection). By fitting this data to a number of models, they concluded that these elements could be classified into three groups by the apparent "strength of regulation": IS1 and 5 (“weakly regulated”); IS2, 4, and 30 (“moderately regulated”); and IS3 (“strongly regulated”).

Based on an initial observation that bacteriophage P1 appeared to accumulate mutations due to insertion sequences when the host strain was stored in agar "stabs" (13), Arber and colleagues undertook a study of the changes in distribution of eight ISs (IS1, 2, 3, 4, 5, 30, 150, and 186) from cultures of 118 individual clones isolated from a single 30 year-old stab of the well characterised Escherichia coli K12 strain W3110 (344). The degree of variation in copy number was found to differ from element to element. When the number of each IS was counted, significant variation was noted in particular for IS5 but also for IS2, IS3, and IS30. Lower variation was observed for IS1, IS4, IS150, and IS186. These variations in copy number were roughly correlated with the number of different patterns of hybridization obtained by extensive Southern blot analysis. For IS30, the data showed that copy number diversity increased in those clones which had generated a particular restriction fragment carrying a tandem dimer of the element, a configuration which results in high transposition levels (IS30 family).

Although given elements common to both these studies appear to display differences in their copy number diversity, it seems inherently unlikely that this could reflect a real difference in behaviour of a specific IS in the two sets of studies. Rather, it may be due to the fact that the Escherichia coli W3110 strain used by Naas et al. (344) was initially homogenous whereas members of the ECOR collection (421, 180) have presumably undergone very different selective pressures.

Horizontal transfer of ISs in nature would not be surprising in view of the number and variety of autonomous extrachromosomal elements such as bacteriophages and plasmids which can serve as vectors, particularly promiscuous plasmids with wide host ranges. Several serendipitous observations, for example, the isolation of identical IS6 family members from Mycobacterium fortuitum and Flavobacterium (Arthrobacter) sp. (IS6100 , 236), clearly support the idea that horizontal transfer occurs in nature.

Some information has been obtained concerning the evolution of certain insertion sequences within the enterobacteria. Analysis of the nucleotide sequences of IS1, IS3, and IS30 from the ECOR collection and from other related enteric bacteria showed that each type of IS was highly conserved within Escherichia coli (278). Since the degree of sequence divergence of several chromosomal genes within these clonal lineages was significantly larger, it was concluded that the ISs had a high turnover and rapid movement. Moreover, strains carrying one type of insertion element also tended to carry other types. This observation is consistent with the idea that multiple insertion sequences can be delivered by a single vector, for example a transmissible plasmid or phage (421). The homologues of these ISs carried by other species of enteric bacteria were divergent from the Escherichia coli elements. This suggested a lower rate of transmission between species. Finally, the presence of "mosaic" variants of both IS1 and of IS3 in certain enteric species led to the conclusion that horizontal transmission (accompanied by recombination) had indeed occurred. Other studies have also compared the differences in the degree of nucleotide sequence variation of ISs with that of chromosomal genes. For IS1 and IS200 elements in natural populations of Escherichia coli and Salmonella typhimurium the results suggested that IS200 has a significantly lower frequency of horizontal transfer than does IS1 (44).

Data consistent with horizontal transfer is also emerging from non-enteric bacteria. In one study, seventeen iso-forms of the ISS1 sequence, largely isolated from bacteria which occupy another complex ecological niche, milk and cheese, were compared. They were determined to fall into three defined subgroups. Not only were nearly identical copies of these IS6 family members isolated from distantly related Streptococcus thermophilus and Lactococcus lactis strains but mosaic copies were also detected (51). Moreover, nearly identical IS6 family members have also been found in Escherichia coli, Proteus vulgaris and Pasteurella piscicida (see 249).

In the case of members of the IS256 family, a phylogenetic tree of 8 members was found to differ significantly from that of their host bacteria (172). Another study indicated that 10 members of the family isolated from actinomycetes formed a distinct group. While they exhibited a similar phylogenetic tree to their hosts (based on 16S RNA and superoxide dismutase, SOD genes) and most showed divergence similar to that of the 16S RNA and SOD genes, IS1512 and IS1511 isolated from Mycobacterium gordonae showed significantly higher divergence (suggesting a higher mobility) and were more related to an element isolated from Rhodococcaceae (366).


<Webmaster >
Last modification : December 20 2001