Cis Elements: Added Boost to the Directed Evolution of Plant Genes

To increase the expression of a native/foreign plant/bacterial gene, the complete network of cis-elements must be excavated to increase its biosynthetic yield, especially under industrial stress conditions. For selecting the best set of cis-elements for a foreign gene and aiding the workflow of researchers, often untrained in bioinformatics methodologies, we developed a modular PERL script for their identification and localization. The script is functional on any operating system. It localizes the cis element network of a gene. It aids an easy customization, as per the required analysis, and provides robust strategy, unlike the usually used databases where several applied calculations often become a tricky task. The script allows an uncomplicated analysis of multiplicity of cis elements along with their relative distances, making it easier for designing the more beneficial network of genes for directed evolution experiments. Through a batched scrutiny of several functionally similar genes, it would aid an easy extraction of their evolutionarily favored network of cis elements. It would be extremely helpful to develop the crop plants that are better adapted to the stressful conditions.


iNtRODUCtiON
2][3][4][5][6] An increased scale of bioproduction and bioaccumulation in the endoplasmic reticulum of plant cells lead to a substantially higher production of industrially important foreign genes, unlike the microorganisms or mammalian cells. 7,8Generally, in plants when a stress condition arises, the transcription factors play a key role at molecular level and bind to their recognition sequences, upstream to the stress responsive genes.For a highly versatile biosynthetic production of a broad spectrum of biomolecules, their industrial application has been booming up in the last two decades. 9,10However, the industrial stress conditions often leads to the improper growth in plants, and thus prove to be the major factor that decreases the overall yield. 11Natural adaptation of plants is time consuming and often fails to yield a gain-of-function mutation in terms of an enzymatic yield or turnover.Although both bacterial and plant systems have been extensively used for the industrial production of several biomolecules, the prokaryotic systems have been preferred over plants for their short doubling time and ease of batch scale-up, especially when proteins are biologically active in native forms, with no requirement of additional posttranslational modifications. 4,6lthough the transcriptional rate is preliminarily aided by the strength of promoter and growth stage of a cell and cis-and transacting factors, 12 the cis elements are found to be the key regulatory switches, acting as binding sites for one or more trans-acting factors. 13,14 trans-acting factor usually binds several different genes and controls their temporal and spatial expression patterns. 15,16The cis-acting regulatory segments are present in the introns and 5′ and 3′ untranslated segments, or coding regions of precursor RNAs and mature mRNAs, and are specifically recognized by atleast one trans-acting factor for selectively regulating the posttranscriptional gene expression. 17The cis elements are found to play a major role in the posttranscription and post-translation of prokaryotic systems.
9][20] Besides the variant copy number and different sequence, the relative localization and mutual distance network of the cis elements, cognate to the gene, are highly variable, and it leads to a substantial change in the yield of the encoded protein. 21These segments are functionally diverse, and have multiple repeats in a gene, 22 and are usually analyzed through sequence information of the conserved transcription factor binding sites and their unique organization within a gene.Due to differences in the set of cis elements, a tissue-specific gene expression profile has been observed at different developmental phases. 23irected evolution strategy has been extensively used for many applications including the functional improvement of several plant/ bacterial genes for producing vaccine or pharmaceuticals, 24,25 strain improvement, 26,27 and building variants with improved activity. 28,29he cis elements often function as an insulator, silencer, enhancer, and promoter in plants, 17 grampositive bacteria, 30 thermophilic archaea, 31 and photosynthetic bacteria. 32Although the synthetic plant/bacterial gene cassettes, with a customized minimal set of most productive cis elements, can be constructed and experimentally modulated under varied stress conditions to increase their expression level, 33 their unambiguous identification is highly difficult, expensive and painstaking.Hence, the researchers often produce the functionally improved gene copies through the directed evolution algorithms without screening and optimizing the best promoter(s) sequences, applicable for a gene.
As the biologically favored cis element network would increase the expression level of cloned genes, promoter engineering should also be considered as the alternative strategy to map the best set of cis elements in the known set of functionally similar and evolutionarily closer genes.To simplify the computational methodology of extracting the naturally encoded set of cis elements in a large dataset of naturally available alternative gene copies, and to subsequently extract the most frequent set of cis elements and design its alternative copies, a PERL script is hereby developed to efficiently span the available sequence space, available in the genome databases.It would allow us to easily construct the functionally active protein variant with most active set of cis elements and would be significantly useful in reliably pacing up the computational promoter design methodologies.

MAteRiAls AND MethODs
Besides the location and mutual distances, the number of repeats or multiplicity of cis elements in a gene is of prime interest for the selected gene to predict its expression level based on the strength of promoter.Although the cis elements are the significantly conserved motif segments, it is usually observed that the non-customizable and ill-programmed interface of the current servers do not allow batch scrutiny of the required scores for the input genes in a user-friendly manner.Moreover, the researchers are unable to customize the strength of the cis element database to restrict their search for only a few required entries.To resolve this issue, the most updated dataset of 469 cis elements (Supplementary Table 1) were retrieved from the PLACE database version 30.034 for define the source set in the in-house PERL script (Publicly available at Github; https://github.com/ashishr123/Cis-element-finding-script-anddataset).The strategy, represented as a flowchart (Fig. 1) is used for an exemplary gene sequence AB022891.1,encoding the glucose-1-phosphate adenylyltransferase protein in Arabidopsis thaliana, to map the network of cis elements.

DisCUssiON
In comparison to the normally deployed servers like PLACE28 or PreCisIon, 35 the programmed script allows the construction of a batched pipeline for several genes.The study would be useful to analyze the evolutionary relationship among the genes in terms of the type and multiplicity of different cis-elements.5][16] It will be extremely useful to design a reliable strategy for the construction of a more productive gene expression cassette, as has recently been used to generate a highly productive arrangement of cis-elements for an enhanced gene expression. 36The strategy has already been manually deployed in several recent articles, 37,38 although such a modular script has not been published so far and its utility would be an added boost for improving the expression profile of a plant gene to attain an improved productivity under the natural/industrial stress environment.Subsequent engineering of gene sequence across its active site will then further boost its stability, turnover number, productivity, and lastly an increased crop-yield.

CONClUsiON
The discovery of the cis elements network in the promoter regions will aid an easy annotation of the genes encoding the putative transcription factors.The developed script locates TATA-box and every cis element within the customized boundary domain, which will assist to construct the batched pipelines for exploring the naturally encoded elements in the homologous set of genes, to figure out the evolutionarily favored set of cis elements under a specific environmental constraint.Computation of multiplicity and analysis of evolutionary relationship among the genes can thus be achieved through this simplified methodology.The fact that the script is based on previous experimental evidence, the prediction should be less prone to in-accuracies but the same also needs to be validated using the updated methodologies in-vitro.It will prove to be a handy tool to improve the expression levels of foreign genes under varied growth conditions or stress parameters and will be very handy in modulating the genetic expression.

Fig. 1 .
Fig. 1.Methodology layout of the programmed script.Methodology layout of the programmed script.

Fig. 2 .
Fig. 2. Cis elements for the exemplary gene AB022891.1.The cis element network and their mutual distances could thus be easily analyzed.