DNA-barcoding and Species Identification for Some Saudi Arabia Seaweeds using rbcL Gene

Among the different biological sources, seaweeds have lot of biotechnological applications. Saudi Arabia is bounded by three bodies of water. With a coastal border of almost 1,800 km. This area high species richness caused by its complex geological history has encompassed genetic and morphological diversity studies for decades. The DNA-barcoding using rbcL gene has proved its usefulness in studying seaweeds phylogenetic diversity, multiple cryptic introductions, environmental modulation the geographical distribution and species identification in different seaweed species. Eight algae samples were collected from different locations in Saudi Arabia. The rbcL gene was used through PCR protocol for species identification. A total number of 8 sequences were obtained with a total sequence length of 5263 bp. where it ranged from 610 to 753 with an average length of 658 bp. The species identification revealed that the specimens samples 1,2,3,4,5,6,7 and 8 belongs to Padina pavonica, Turbinaria gracilis, Carpomitra costata, Pterocladiella capillacea, Cladostephus spongiosus, Ulva lactuca, Sporochnus comosus and Sargassum muticum respectively. The rbcL-based DNA bar-coding was almost successful to identify different seaweeds specimens according to species and genus. Some specimens rbcL was not adequate to identify genus level and failed to differentiate between highly similar species. We suggest to use more DNA barcoding techniques in addition to rbcL to ad more resolution to the species identification.


INTRODUCTION
Among the di fferent bi ol ogical sources, seaweeds have lot of biotechnological applications. Seaweeds reduces inorganic nutrient wastes environmental effects through absorption undesirable compounds from mariculture (Al-Hafedh et al. 2012) Potential medical and biological applications for seaweeds have been reported such as anti-Inflammatory, antibacterial, antifungal, antiviral and antitumor effects (Men'shova et al. 2012; Zheng et al. 2016). Additionally, biological invasions of non-native seaweeds species proved to have a high environmental and economic impacts and considered one of the main threats to global biodiversity (Nunes et al. 2014). Saudi Arabia is bounded by three bodies of water. With a coastal border of almost 1,800 km. This area high species richness caused by its complex geological history has encompassed genetic and morphological diversity studies for decades (Basson, 1979). The complex current circulation is an influencing factor in shaping and homogenization the population structure of seaweeds (Buchanan and Zuccarello, 2012). Shipping is the most important pathway for integrating non-indigenous marine species and more than a thousand marine foreign species have been reported in different European countries (Katsanevakis et al. 2013). Moreover, the activities of aquaculture could cause a variety of environmental effects involving habitat modification and coastal eutrophication (Read and Fernandes, 2003). Seaweeds are widely distributed worldwide, inhabiting areas with divergent environmantal conditions, from tropical to polar regions (Harley et al. 2012; Guiry and Guiry, 2016) This wide distribution can caused structured populations through environmental differences and the existence of cryptic species and restricted gene flow (Calegario et al. 2019). We are far from unravel the true biogeography and genetic diversity of seaweeds in this region. Conducting algae population structure is an important component for the development of a basic knowledge of seaweeds ecology research. These studies could inform population subdivision, population variation and geneflow.
Identification of seaweeds based on morphology is a classical method, so DNA-based molecular techniques have increasingly been used for taxonomical identification. These studies interested mainly on the use of DNA sequences to elucidate taxonomic and systematic issues (Kim et al. 2014). Among these techniques DNAbarcoding where a short piece of DNA is used to distinguish between species. Progress within DNAbarcoding systematics has led to the development of different approaches in algal phylogenetics and increased the use of newly developed molecular markers. Ribulose-l,5-bisphosphate carboxylase / oxygenase (RuBisCO) is responsible for facilitating fixation of primary CO2 in the Calvin cycle. This enzyme quaternary structure composes of 8 small and 8 large subunits. In chlorophytes the small subunit (rbcS) is encoded in the nuclear genome and the large subunit (rbcL) is plastid-encoded (Palmer, 1985).The plastid RuBisCO large subunit (rbcL) marker has been widely used for studying unknown species taxonomic position to clarify phylogenetic relationships between different species (Tan et al. 2015). The stable exon structure of rbcL gene and high amino acid sequence similarity, supported it as a reliable marker for these studies (Kazi et al. 2013). Additionally, The Consortium for the Barcode of Life (CBOL) has recommended the rbcL and matK genes as DNA barcodes for plant species (Group et al. 2009). The rbcL gene has proved its usefulness in studying seaweeds phylogenetic diversity, multiple cryptic introductions, environmental modulation and the geographical distribution in Japanese red alga The rbcL data at a variety of taxonomic levels has a high resolution improving its utility for the study of interspecific relationships (Yang et al. 2008). Therefore, the aims of this study were to assess the genetic diversity and discover the phylogeographic structure for seaweeds specimens using molecular genetically identification using rbcL genes as barcodes. The retrieved rbcL sequences for these specimens have been used as a key markers for studying the relationship between species of seaweeds algae depending on genetic divergence and geographic distribution. Additionally, the effectiveness of rbcL-based DNA barcoding has been evaluated for species identification and differentiation after blasting new sequences against GenBank databases.

DNA extraction, polymerase chain reaction amplification and sequencing
For most samples of seaweeds, DNA was extracted from 5 g of fresh algal using DNAeasy Plant Mini Kit (Qiagen, Santa Clarita, CA) according to the manufacturer's instructions. The DNA concentration was estimated by injecting 2 µl of the parents DNA samples on 1% agarose gel in comparison to 10µl of a DNA size marker (lambda DNA Hind III digest Phi X 174/HaeIII digest) and the degree of fluorescence of the DNA sample with the different bands in DNA size marker was compared. DNA barcoding analysis was performed with the plastidial rbcL region. For PCR amplification and sequencing of rbcL, the reaction mixture consisted of 1x buffer (Promega), 15 mM MgCl2, 0.2 mM dNTPs, 20pcoml of each primers, 1u of Taq DNA polymerase (GoTaq, Promega), 40 ng DNA and ultra-pure water to a final volume of 50 uL. For rbcL PCR the forward and reverse primers were, respectively, rbcL-F (5'-ATGTCACCACAAACAGAGACTAAAGC-3' ) and rbcL-R (5'-TCGCATGTACCTGCAGTAGC-3'). The PCR predicted product size was estimated to be 600bp.
PCR amplification was performed in a Perkin-Elmer/GeneAmp® PCR System 9700 (PE Applied Biosystems) programmed to fulfill 40 cycles after an initial denaturation cycle for 5 min at 94°C. Each cycle consisted of a denaturation step at 94°C for 30 sec., an annealing step at 50°C for 30 sec. and an elongation step at 72°C for 30 sec. The primer extension segment was extended to 7 min at 72÷C in the final cycle. The amplification products were resolved by electrophoresis in a 1.5% agarose gel containing ethidium bromide (0.5ug/ml) in 1X TBE buffer at 95 volts. A 100bp DNA ladder was used as a molecular size standard. PCR products were visualized on UV light and photographed using a Gel Documentation System (BIO-RAD 2000). Amplified products for all PCR were purified using GeneJET PCR Purification Kit (Catalog number: K0701) according to the manufacturer's instructions. The purified PCR products were incubated at room temperature for 2 minutes and when store purified DNA at -20 °C. The rbcL PCR products DNA sequencing was carried out by Macrogen Inc., Korea using Sanger DNA sequencing technology.

Computational analysis
The rbcL PCR product sequences were used for species identification was carried out using NCBI-BLAST (Altschul et al. 1997) to search GenBank databases. Species identifications were accepted if they showed more than 98% similarity to the reference sequences available in databases. The most similar 100 rbcL sequences revealed by NCBI-Blast were used for further phylogenetic analysis. Sequence Alignments were made using CLUSTALW (Thompson et al. 1994). The phylogenetic trees were constructed using iTOL online tool (Letunic and Bork, 2006). The visualizing and clustering of multivariate phylogenetic similarity data was conducted using ClustVis web tool (Metsalu and Vilo, 2015). Additional seaweeds sequences information obtained from GenBank were used in geographical distribution analyses, where geographical distributions were demonstarted through R package using "rworldmap" package (South, 2011). The R package "ape". Paradis et al. 2004) has been used for phylogenetic trees statistical analysis. The global algal database of taxonomic were used as a reference for algae distribution (Guiry and Guiry, 2016).

PCR amplification
The rbcL PCR primers were successful to amplify and produce PCR product with expected band size (≈ 600 bp) (Fig. 1).

Species identification
A total number of 8 sequences were obtained with a total sequence length of 5263 bp, where it ranged from 610 to 753 with an average length of 658 bp (Supplementary 1). The NCBI-BLAST tool was successful to identify 7 rbcL specimens sequences according to it species and Journal of Pure and Applied Microbiology genus ( Table 1 & 2). All sequences species were successfully identified and most sequences have an identity percentage higher than 90%. On the other hand, the query coverage which stands for the length of similar nucleotide between the query (our PCR product sequences) and the subject (rbcL genes in the NCBI database) ranged from 73% (sample-2) to 96% (sample-7), indicating a similarity issue with sample-2, could be regarding to the low number of sequences for this species in the database or the low efficiency of rbcL gene to identify this species. The species identification revealed that the specimens samples 1,2,3,4,5,6,7 and 8 belongs to Padina pavonica, Turbinaria gracilis, Carpomitra costata, Pterocladiella capillacea, Cladostephus spongiosus, Ulva lactuca, Sporochnus comosus and Sargassum muticum, respectively.

Seaweeds species geographical distribution
In order to strengthen the results, the NCBI Genbank submitted sequences information has been used to clarify and compare the geographical location for identified species between the AlgaeBase database Guiry and Guiry, 2016) in the species and genus levels (Fig. 2). Although the AlgaeBase database located the some seaweeds habitat locations in specific locations in the world, some rbcL sequences submitted from specimens collected refer to the extension of these species areas (Fig. 2) for example Carpomitra sp., Padina sp. and Sargassum. Additionally, the obtained maps fill the information gaps for the geographical distribution of Turbinaria sp. which is missing in the database.

Multiple sequence alignment
The sequences of the rbcL gene along with high similar GenBank genes were used to generate the dataset of nucleotide positions in seqeunce alignment. The intron was removed from the rbcL gene sequences before analysis and the sequences were translated into amino acids forms. In total, 8 clusters were recovered in multiple sequence alignment analysis (Supplementary 1). A heat map were constructed depicting the similarity matrix between rbcL sequences between every specimen and similar sequences in the GenBank database. According to (Fig. 3A) the most genetically similar species to sample 3 are accessions ABU51019,ABU51017, ABU51021, ABU51015 and ABU51019, which belongs to Carpomitra costata species with a similarity score of 98.51%. On the other hand, the sequence belongs to Sargassum echinocarpum (ABW81412) has the lowest similarity score (84.13%). Generally, genus Sargassum sp. has the lowest similarity scores with our Carpomitra costata rbcL sequence. notably, the ABW81412 has the lowest similarity score with all sequences set (continuous red boxes in Fig 3A). Sample 5 has a 92.46% similarity with CBJ55508, which belongs to Cladostephus spongiosus and a low similarity (86.93%) with BAK64367, which belongs to Cutleria multifida (Fig. 3B). The general reddish color for similarity matrix inform that, although these species are genetically similar, they   Fig. 3B). The sample 1 has a similarity score higher than 91% with sequences belongs to species such as Padina pavonica, Padina antillarum, Padina tetrastromatica (Fig. 3C) (Fig. 3D). Sequences belongs to species Sargassum muticum, Sargassum glaucescens, Sargassum fusiforme, Sargassum emarginatum, Sargassum zhangii and Sargassum confusum have the highest similarity to sample 8 (99.5%), suggesting that rbcL was successful to identify sample genus, although it could not differentiate between different species with the same level. On the other hand, the species Haplospora globosa has the highest dissimilarity with other species (Fig.  3E). The sequence of sample 7 has a high similarity with sequences ABU51046 and ABU51072 belongs to species Sporochnus comosus and Sporochnus radiciformis (97.03%). The similarity matrix is divided into 3 distinctive subpopulations (blue areas) (Fig. 3F), suggesting the low power of rbcL to differentiate between species inside the population but it has a high power to differentiate between species belongs to different populations.
The rbcL gene failed to identify adequately the species of sample 2. The highest score of similarity (91.3%) was with sequences ASJ80801, ABE02294 and AAP78675 belongs to species Turbinaria ornata, Turbinaria decurrens and Myriodesma integrefolium, respectively. The sequence CAC67466 belongs to Desmarestia viridis has the highest dissimilarity with other sequences (Fig. 3G). The DNA-barcoding was successful to identify the species of sample 6 with 98.4% of similarity with rbcL sequence ABP82141 belongs to Ulva lactuc. The highest dissimilarity was 95.90% with Ulva lobata (Fig. 3H). According to similarity matrix, rbcL gene has a low differentiating power between most of species belongs to geuns Ulva.

Phylogenetic analysis
The sequences of the rbcL gene along with high similar GenBank genes were used to generate Sargassum muticum Fig. 2. The geographical distribution for studied seaweed species as retived from AlgaeBase (above) and Genbank sequence information (bellow). . 3. The multiple sequence alignment matrices showing the relation between different seaweads specimens according to country and genus. The similarity scale ranged from highly similar sequences (blue) to highly divergence sequences (red) the dataset of nucleotide for phylogenetic analysis (Fig. 4). The phylogenetic tree constructed for sample 3 (Carpomitra costata) divided specimens from into six groups. The mean branch length was 0.0018 with a variance of 2.59e-05,where the length ranged from -0.027 to 0.038. The sample 3 with formed a monophyletic clade together with other species belongs to Carpomitra genus. While the genus Sargassum sp. formed a distinctive clades (Fig. 4A). In sample 5 (Cladostephus spongiosus) the phylogenetic tree mean length is 0.0042 with a variance of 3.9e-05. Where the tree length ranged from -0.01 to 0.05. the phylogenetic tree consists of 5 clades, where sample 5 formed one clade with another C. spongiosus. The Lobophora and Dictyota genus formed separate clades, which suggested that, rbcL gene can successfully differentiate between these genus (Fig. 4B). Additionally, the phylogenetic tree obtained by sample 1 and other GenBank sequences was diveded into six clades with a mean length and a variance of 0.002 and 1.36e-05, respectively. The phylogenetic tree branch length ranged from -0.0094 to 0.029. The specimen 3 (Padina pavonica) formed a clade with other Padina rbcL sequences (Fig. 4C). The mean branch length and variance for sample 4 tree was 0.004 and 5.5e-05, respectively. The branch length ranged form -0.01 to 0.06. The sample 4 (Pterocladiella capillacea) ranged a separated clade with other Pterocladiella sequences (Fig. 4D). Sample 8 phylogenetic tree consists of 7 clades (Fig. 4E). The mean of branch length was 0.003 with a variance of 4.5e-05. The branch length ranged from -0.024 to 0.05. The rbcL sequence of sample 8 (Sargassum muticum) contained in one clade with other Sargassum genus sequences. The mean of branch length for phylogenetic trees of sample 7 (Sporochnus comosus), 2 (Turbinaria gracilis) and 6 (Ulva lactuca) was 0.003, 0.003, 0.001 with a variance of 4.5e-05, 6.3e-05 and 1.76e-05, respectively. The branch length in sample 7, 2 and 6 trees ranged form -0.014 to 0.029, from -0.024 to 0.063 and from -0.017 to 0.019. The sample 7 and 2 formed with sequences belongs to the same genus a clade, while sample 6 formed with rbcL sequence belongs to the same species (Fig. 4F-H). Clearly clustered all rbcL sequences of U. lactuca in one clade and C. taxifolia in another, indicating that even though sequences varied within the species (evident from branching patterns of phylogram), variations are minimum enough to cluster the same species in one clade indicating the DNA barcode ability of the rbcL gene (Mahendran and Saravanan, 2017).

The characteristic and biological application for the identified seaweeds samples
The extract of Carpomitra costatahas has a potential anti-Inflammatory antibacterial, antifungal, and photoprotective effects and is a part of the traditional diet in Asia (Pesando and Caram, 1984;Yim et al. 2018). Cladostephus spongiosus (Hudson) is a brown alga, with one to many repeatedly branched, subdichotomous axes arising from a crust-like hold fast (El Hattab et al. 2008). On the other hand, It was reoiretd that P. pavonica is a potential source of alginic acids, which have a high capacity for fucoidans, and gelation, which shows anticancer effect Moreover, Sargassum muticum is native to the northwest Pacific region and has become the dominant species at the low-tide level in North America west coast (Fletcher and Fletcher, 1975;). Additionally, Sargassum muticum, has been appeared in Europe in the early 1970s and is now found on shorelines from Norway to Portugal (Engelen and R. Santos, 2009). The growth rate of S. muticum is generally considerably higher than most UK seaweed species and has been described as very invasive in terms of its rate of spread (Davison, 2009;Milledge and Harvey, 2016). notably, The Sporochnus comosus has been reported to have cytotoxic compounds. The Turbinaria is characterized as having monopodially developed axes, radially branched, bearing leaves, more or less turbinate structures and and flat or concave outer face (Feitsch, 1945;Wynne, 2002). This species are utilized by herbivorous fishes in tropical areas, it has a low level of tannins and phenolics (Steinberg, 1986). with approximate time of regeneration from 24 to 36 months (Ellis and Ellis, 2002). As an example, in the genus Ulva the rbcL has been studied massivelly to clarify taxonomic issues (Group et al. 2009). Enteromorpha and Ulva, two major genera in the Ulvaceae, are known for their global distributions, many species and morphological variation (Hoek et al. 1995)

CONCLUSION
Molecular analysis was by far the most effective method of species level identification in this work and rbcL sequences supported much more precise identifications as previously reported (Gargiulo et Kim, 2015). Saudi Arabia coasts are undiscovered areas for seaweeds ecology, this study was successful to report some known species in this area. The rbcLbased DNA bar-coding was almost successful to identify different seaweeds specimens according to species and genus. Some specimen rbcL was not adequate to identify genus level and failed to differentiate between highly similar species. We suggest to use more DNA barcoding techniques in addition to rbcL to ad more resolution to the species identification. We hope that this study would be a small push Arabian researchers unravel and studied the unknown ecological areas in their countries.

CONFLICT OF INTEREST
The authors declares that there is no conflict of interest.