The chloroplast genome has become a major focus for studies of plant molecular evolution. This genome had a small size and evolves slowly in general (Clegg et al. 1991), making it an ideal system to assess phylogenetic relationships among genera (Conti et al. 1993; Downie et al. 1996; Ben Mustapha et al. 2013), families (Olmstead et al. 1992) or higher levels (Chase et al. 1993; Davis 1995; Nickrent and Soltis 1995). The coding sequences of many genes are interrupted by stretches of noncoding DNA termed introns. Introns are widespread genetic elements found in all major groups of organisms. Since they were discovered, their origin has been debated and different theories have been proposed (Darnell and Doolittle 1986; Bell-Pedersen et al. 1990). In the conserved trnT'trnF region of the chloroplast genomes ranging from mosses to seed plants, a group I intron splitting the tRNALeu gene is a stable component of the trnT'trnL'trnF cistron and has been widely used for reconstructing phylogenetic inference at the intra- and interspecific levels (Segraves et al. 1999; Yu 2007). The intron of trnL (UAA) is regularly used and displays a relatively low level of variation insofar as it has been examined (e.g., Bayer and Starr 1998). Though coding regions could have phylogenetic potential, it is currently widely believed that within the chloroplast genome, the non-coding regions provide the most practical source of data for phylogenetic inference at lower taxonomic levels (cf. Morton and Clegg, 1993). In plants, this intron has been used in both systematic studies where high resolution is needed and for population genetics (Ferris et al. 1995; Gielly and Taberlet 1994, 1996; Kita and Kadota 1995). It has, for example, been used to resolve phylogenetic relations when the sequence for the rbcL gene showed too little variation (Gielly and Taberlet 1994). The ability of the group-I introns to catalyze their own splicing is dependent on their highly conserved secondary and tertiary structures. Different group-I introns have relatively little sequence similarity, but all share a series of short conserved elements, P, Q, R, and S, known as the catalytic core (Cech 1994).
In this study, we investigate the nucleotide sequence diversity and the mode of evolution of the trnL (UAA) chloroplast intron in Tunisian fig accessions. The fig, Ficus carica L., (Moraceae) is a fruit tree of antiquity associated with the beginning of horticulture in the Mediterranean basin (Zohary and Spiegel-Roy 1975). It is known to have been domesticated from a group of diverse spontaneous figs occurring in the South and East of the Mediterranean region sometime in the Early Neolithic period (Zohary and Hopf 1993). In Tunisia, the fig germplasm consists of numerous landraces mainly selected by farmers for their fruit qualities and maintained in orchards. They are widely extending through different eco-geographical areas of the country and are threatened by genetic erosion. In recent years, several works have focused on the identification and characterization of Tunisian fig cultivars, to elaborate a national core collection and to preserve these genetic resources (Salhi-Hannachi et al. 2005; Chatti et al. 2007; Saddoud et al. 2007; Baraket et al. 2009a, 2009b, 2010).
This study as a part of our research activities, and had a goal to analyze the sequence variation of trnL intron, to detect a conserved motif of trnL sequences, to reconstruct the gene tree, to detect footprint of selection and to investigate the evolutionary model of this intron.
Material and methods
Forty nine cultivars (41 female and 8 male trees) of Tunisian fig (Table 1) were used in this study. These were collected from five regions. Plant material consisted of young leaves sampled from adult trees.
Total genomic DNA was purified from frozen young leaves according to the procedure of Dellaporta et al. (1983). The DNA concentration was estimated spectrophotometrically and its integrity was checked by analytical [1% (w/v)] agarose minigel electrophoresis (Sambrook et al. 1989).
PCR amplification, purification and sequencing
The intron trnL (UAA) was amplified using specific primers designed by (Taberlet et al. 1991). The procedure followed is that previously described in Baraket et al. (2010). Cycle Sequencing and the Big Dye Terminator Ready Reaction Kit (Applied Biosystems, Foster City, CA, USA) were used.
All sequence information has been deposited in the GenBank database (accession nos. EU191005- EU191024 and GQ395387- GQ395415). The derived nucleotide sequences were aligned using ClustalW package in the DAMBE program (Xia 2000) and analyzed with MEGA version 5.0 software (Tamura et al. 2011). For each sequence, length and GC content were estimated. The alignment was manually checked and pairwise sequence divergence between cultivars in trnL intron was calculated according to the Maximum Composite Likelihood (MCL) (Tamura et al. 2011). The resultant distance matrix was then computed to generate phylogenetic trees according to the Neighbor-joining method (Saitou and Nei 1987). Positions containing gaps and missing data were eliminated from the dataset (Complete Deletion option). Support values of the internal branches of the NJ tree were evaluated through bootstrap method (1000 replicates) (Tamura et al. 2011). The consistency (CI) and retention (RI) indexes were calculated (Kluge and Farris 1969). The transition/transversion ratio ti/tv was estimated using the following formula R = [A*G*k1 + T*C*k2]/[(A+G)*(T+C)] with A, G, C, T as the corresponding frequencies of four nucleotides (Tamura et al. 2011). The number of nucleotide substitutions per site for analysis between sequences is given. Aligned sequences in the Mega files were analyzed with DnaSP software version 5.10.01 (Librado and Rozas 2009) to estimate polymorphism indices. In fact, genetic diversity was quantified by indices of haplotypes diversity (Hd) (Nei and Tajima 1983) and pairwise estimates of nucleotide divergence (Pi) (Jukes and Cantor 1969). The average of nucleotide differences (k) and the minimum number of recombination events (Rm) are also estimated. Selective neutrality was tested by both Tajima's D (Tajima 1989) and Fu and Li's D* and F* methods (Fu and Li 1993). Predicted secondary structures of the conserved motif in trnL intron and associated free energy values algorithm were evaluated with the minimum-free energy (MFE) algorithm (Zuker 2003). Fold predictions were made by the use of the mfold version 3.1 (Zuker and Turner 2003) (www.bioinfo.rpi.edu/applications/mfold). Demographic parameters were assessed using the distribution of pairwise sequence differences (mismatch distribution) of Rogers and Harpending (1992) and site-frequency spectra (distribution of the allelic frequency at a site) of Tajima (1989) using the program DnaSP software 5.10.01 (Librado and Rozas 2009). The genetic relationship of the inferred haplotypes was graphically displayed by the network program NETWORK version 188.8.131.52 (Bandelet et al. 1999).
Results and discussion
The amplified fragment shows a length of about 600 bp corresponds to the tRNA Leu intron (UAA) of chloroplast DNA. The result of BLAST shows the authenticity and the identity of sequences, it is indeed the trnL intron in the genus Ficus.
Nucleotide composition and length variation of trnL sequences
The sequences obtained showed variations on both their lengths and their nucleotide compositions for all fig cultivars. Indeed, the size of the trnL intron is ranging from 546 bp in the pollinator 'Dhokkar 3' to 607 bp in the cultivar 'Khadhri ' with an average of 576.3 (Table 2). The size of the intron trnL (UAA) of the fig tree is consistent with the size range recorded in several angiosperm taxa. Indeed, the size varies from 324 bp in Calycanthus floridus to 615 bp in Orontium aquaticum (Borsch et al. 2003). For Dipetrocarpaceae species, the size of the intron trnL varies from 458 bp to 509 bp (Kajita et al. 1998). The length of this same region of the cpDNA was also reported to Triticum aestivum (587 bp) and Hordeum vulgare (555 bp) (Gielly and Taberlet 1996), Panicum virgatum L. (557 pb) (Missaou et al. 2006) and Prunus species (574 bp) (Ben Mustapha et al. 2013). Among gymnosperm lines, the size of this intron is 447 bp ?? 2.26 in Taxus, 450 bp in Pseudotaxus, of 447pb in Austrotaxus, of 477pb in Amentotaxus, and 461-472 bp in Cephalotaxus (Hao et al. 2009). Sequence analysis of the trnL intron shows that the GC content varies from 32.7 ('Widlani') to 36.3 ('Besbessi') with an average of 34.1 (Table 2). Similar results were obtained in Prunus spp. (33.8%) (Ben Mustapha et al. 2013), in Torreya and Cephalotaxus gymnosperm lines respectively with GC base percentage of 35.12 and 35.19% (Hao et al. 2009). High values of AT bases are consequently found. The AT content varies from 63.6 ('Besbessi') to 67.2 ('Widlani') with an average of 65.9. The same result was obtained in switch grass by Missaoui et al. (2006) and by Ben Mustapha et al. (2013) in Prunus spp..
Nucleotide composition variation and mutational events of trnL sequences
The nucleotide composition of the intron trnL (UAA) shows that the frequencies of the nucleotides making up the sequence are: 0.395 (A), 0.278 (T), 0.147 (C) and 0.18 (G). A similar composition has been reported for the trnL intron of cpDNA among Dipetrocarpaceae species with frequencies of 0.40 (A), 0.27 (T), 0.15 (C) and 0.17 (G) (Kajita et al. 1998) and in Prunus spp with frequencies of 0.34 (A), 0.323 (T), 0.163 (C) and 0.174 (G) Ben Mustapha et al. (2013). Transition / transversion ratio were calculated K1 = 1.248 for purine nitrogenous bases and K2 = 0.841 for pyrimidine bases and R = 0.312 for all bases. The relatively high content of AT (65.9) may partly explain the high proportion of transversions identified. Similar values are obtained in Taxus (R=0.496), in Amentotaxus (R=0.384) (Hao et al. 2009) and in Prunus spp. (R= 0.293) (Ben Mustapha et al. 2013).
The different substitutions detected are given in table 3. The analysis of table 3 shows that of Tunisian fig the T'C and A'G transitions are more frequent than C'T and G'A transitions and in the whole analysis the transversion are more frequent than transitions at the trnL intron (Fig. 1).
Multiple alignments of sequences allows for the establishment of a matrix of 667 characters in the form of 437 conserved sites, 213 variable sites distributed in 130 informative sites and 79 singleton sites. After eliminating gaps, 380 conserved sites, 57 variable sites which include 32 informative sites and 25 sites unique were detected. A high level of polymorphism in the intron trnL (UAA) of chloroplast DNA can then be reported. Indeed, 37 haplotypes were detected among the 49 studied figs. The haplotype diversity (Hd) and nucleotide diversity (??) were estimated at 0.951 and 0.018 respectively for sequences of trnL (UAA) intron. In addition, an average value of differences in pairs of nucleotides (K) of 8.16 was found showing a large genetic diversity at intron chloroplast (Table 4).
Conserved group I intron sequence motifs (P, Q, R and S) and Repeat patterns
The alignment of tRNALeu (UAA) intron reveals great sequence variability. Sequence variation is mostly confined to certain regions that, when the alignment is compared with the secondary structure predictions, are localized in some of the loops or hairpin structures. For this, trnL intron was considered a mosaic structure of conserved elements (Internal guide sequence P, Q, R, and S) and common secondary structure elements which are essential for correct splicing (Cech 1990). From Ficus Carica, finding these patterns shows, that the sequence R (GTGCAGAGACTCAA) was detected within the trnL (UAA) intron in all cultivars studied and that the motif S (AAGATAGAGTCC) is observed in most fig with a deletion of T compared to that reported by Qundt and Stech (2005) (S: AAGATTAGAGTCC). The pattern P: AATTCAGAGAAA sequence was detected in all fig studied except 4 cultivars whose sequences have undergone mutations 'Khartoumi' and 'Hemri 1' (AATTTAGAGAAA : C'T substitution) and 'Khadhouri' and 'Kahli 1' (AATTAGAGAAA: deletion of C). Qundt and Stech (2005) showed that the consensus sequences of the P motif in mosses are more conserved than those of liverworts, their respective sequences are: GATTCAGGGAAA and WATTCAGDGAAA. Fig. 2 shows the 2D structure of the conserved motif R, S, P of the tRNALeu (UAA) intron of Tunisian figs and secondary structure of sequences have changed by mutation. In addition, Qundt and Stech (2005) showed that the consensus sequence of Q motif of Bryophytes is: RATCCTGAGC. In Ficus carica, Q motif: AATCCTGAGC was detected in all cultivars except one sequence of caprifig 'Dhokkar 1' has undergone a mutation (AAACCTGAGC: T'A substitution).
In cases where variation in single nucleotides occurs, the secondary structure of the conserved motif of intron is often retained. This can be seen both in cases where a change in one position is accompanied by a change in the base pairing strand so that different sequences either have a G: C or A: T/U base pair in this position, as well as where base pairing of G: T/U type allows changes to occur on one strand without disturbing the base pairing structure. As can be seen in fig. 2, most of the positions of mutation in base pairing regions are also labelled to indicate the retained structure among different sequences. There are two possible explanations for this: (1) there may be a requirement for base pairing to retain the structure needed for autocatalysis of the intron, which would give a selection pressure for compensatory changes, and (2) the tendency for a higher mutation rate in unpaired or mispaired bases on structures formed by single-stranded DNA during, for example, transcription may also play an important role (Wright 2000).
Moreover, sequence analysis of trnL intron allowed the detection of repeated sequences for all cultivars such as (CT)4, (GA)4, (A)4T, (A)3 and (C)3. These repeated patterns have also been reported in Bryophytes at the same intron of chloroplast DNA (Qundt and Stech 2005).
Genetic relationships of trnL intron sequences
The genetic distance 'Maximum Likelihood Composite' (MCL) based on comparison of trnL intron sequences shows that they range from 0.00 to 0.065 with an average of 0.019. No distance (0.00) was observed between [' Khalt'- 'Sawoudi 2', 'Makhbech', 'Hamri', 'Grichy', 'Tounsi', 'Dhokkar Zarzis', 'Zaghoubi', 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Kahli 2'- 'Dhokkar 2' and 'Zidi 3']; ['Sawoudi 2'-'Hamri', 'Grichy', 'Tounsi', 'Dhokkar Zarzis', 'Zaghoubi', 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Makhbech' ' 'Hamri', 'Grichy', ' Tounsi', 'Dhokkar Zarzis', 'Zaghoubi', 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1'] ; ['Hamri' ' 'Grichy', 'Tounsi', 'Dhokkar Zarzis', 'Zaghoubi', 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Dhokkar 2' ' 'Zidi 3']; ['Grichy'- 'Tounsi', 'Dhokkar Zarzis', 'Zaghoubi', 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Tounsi'- 'Zaghoubi',' Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Dhokkar Zarzis'- 'Zaghoubi', 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Zaghoubi'- 'Zidi 4', 'Dhokkar 4' and 'Chetoui 1']; ['Zidi 4' - 'Dhokkar 4' and 'Chetoui 1']; ['Dhokkar 4' and 'Chetoui 1']. Contrarily, 'Bither Abiadh 1' and 'Khadhri' accessions are the most divergent at cpDNA as they show the highest genetic distance (0.065).
The study identified 329 parsimony trees and the most parsimonious tree has a length of 153 steps with a consistency index (CI) of 0.484 and a retention index (RI) of 0.507. Little homoplasy then characterized the trnL intron sequences of the Tunisian fig tree.
The dendrogram (NJ) illustrating the genetic relationships between the studied fig shows the presence of two groups (Fig. 3). the first cluster marked (I) is composed of 'Bither Abiadh 1', 'Kahli 1', 'Chetoui 2', 'Soltani 3' varieties and the pollinator 'Jrani'. The second cluster (II) and appointed with all other cultivars, is divided into two sub-groups: The first (II1) includes 'Baghli', 'Besbessi', 'Gaa Zir', 'Sawoudi 1', 'Zidi2' varieties and 'Assafri' pollinator and the second named (II2) contains the rest of the cultivars. Note that the 8 caprifigs are scattered between the two groups.
Tajima's and Fu and Li's tests
Statistical tests of evolutionary neutrality Tajima's D (Tajima, 1989) and Fu and Li's (Fu and li, 1993) were performed using the sequences aligned. Our results show significant negative values obtained for both tests (Fig. 4). The deviation from neutrality selectively highlighted by tests and Tajima and Fu and Li, is explained by an excess of rare mutations as identified in singleton in sequences studied. Our results reflect the action of positive selection and population expansion or as evidenced by the values of the test of Fu and Li: D*: -2.15*, 0.10> P> 0.05, F *: -2, 54 * P <0.05 for trnL UAA intron of cpDNA (Table 4) (Fig. 4). To clarify the cause of the deviation from the neutrality statistics Fu is estimated. This quantity is known to be very powerful in detecting the deviation from neutrality and to test the population growth and the recent enlargement of the sample. The analysis of Table 4 shows strong and significant negative values for this parameter: Fu's Fs = - 23.48 for trnL intron.
Fu and Li parameters and statistics of neutrality Fu reject neutrality in the trnL intron analyzed and suggest the action of a recent demographic expansion of the Tunisian fig. Our suggestions are also confirmed by the R2 index (Ramos-Onsins and Rosas, 2002) calculated and shows low value of 0.0603 for the trnL(UAA) intron for cpDNA.
Spectra variations of nucleotide diversity (??) and segregating sites (S) for the sequences studied schematized used to locate the sites affected by the selection and clarify the scope of this (Fig. 5). Sites 200 and 400 bp are the sites where selection operates in the trnL intron of chloroplast DNA. The Empirical distribution of the pairwise number of nucleotide differences and segregating sites among pairs of individual's show deviation from the selective neutrality and appears multi-modal for the intron (Fig. 6)
The haplotype network based on the 49 sequences of the trnL intron of of cpDNA (Fig. 7) shows the presence of 37 haplotypes and shows a clear demographic expansion and a local evolution of figs from an ancestral haplotype. This is underlined by the Star-like shape structure of the network at the turn of the founder haplotype H20 represented by the 'Chetoui 1', 'Dhokkar 4', 'Zidi 4', 'Zaghoubi', 'Dhokkar 2', 'Tounsi', 'Grichy', 'Hamri', 'Makhbech', 'Sawoudi 2' and 'Khalt'. H20 seems to be the ancestor of most chlorotypes formed during the evolution (Fig. 7). The network structure is more complex with many divergent haplotypes (with one or two mutational steps) of the ancestral haplotype H20 and the branches strongly divergent (many mutational steps). This seems to be consistent with the negative and significant values of Tajima D, the evidence of history and ancient origin of the fig in Tunisia.
Network analysis of different noncoding regions of ribosomal DNA (ITS: ITS1, 5.8S, ITS2) (Baraket et al. 2013) and that of trnL-trnF intergenic spacer of cpDNA (Baraket et al. 2009b) and of trnL intron showed that the cultivar 'Hamri' belongs to all ancestral haplotypes detected. This last, seems to be the ancestor of all nuclear and cytoplasmic haplotypes of Tunisian fig studied. Part of this work, involves the identification of non-coding region subjected to cytoplasmic selection as an evolutionary force. The search for the molecular signature of selection is made in the sequences of trnL intron of cpDNA. The positive natural selection detected of the target intron was used better understand the evolutionary past of the species and to identify important functional genetic variants. Indeed, the selection leash traces of DNA sequences and fingerprints are detected when the genetic variability of a region is different from that expected under the hypothesis of selective neutrality. Our results show that intraspecific variation is obtained by analyzing trnL intron of cpDNA. A deviation from neutrality was detected and explained by positive selection and / or demographic expansion of Tunisian fig studied. In addition, cytoplasmic markers are very powerful in building an evolutionary scenario of the case. The results indicate that positive selection and demographic expansion have together contributed to the trends of nucleotide diversity and haplotype structure. In fact, according to Gillespie (2000) and Lagercrantz et al. (2002), each site in the genome is affected by the selection and variation in certain genes, and is more likely to be affected by natural selection than others.
Our study shows that the fig tree (Ficus carica L.) is one of the oldest domesticated fruit and is a good model for the identification of positive selection or selective sweep in the chloroplast genome. Great diversity and high differentiation chlorotypes are observed in the local germplasm. Accessions studied can be identified as a significant development unit (SDU) providing a rational basis for the identification of candidate units for conservation as was reported by Moritz (1994), Newton et al. (1999) and Andrianoelina et al. (2006).
Source: Essay UK - http://www.essay.uk.com/free-essays/science/chroroplast-genome.php