In silico Identification of Resistance and Defense Related Genes for Bacterial Leaf Blight (BLB) in Rice

Bacterial leaf blight (BLB) disease is a rice disease caused by Xanthomonas oryzae pv.oryzae (Xoo). This disease causes devastating losses in the rice industry. To date research has been directed towards identifying QTLs and disease resistance genes that will afford resistance against Xoo. The rice plantations in different parts of the world are attacked by various strains of Xoo and thence the varieties that exhibit resistance to these strains may vary from one geographical location to another. In this paper we have analysed a QTL that has been recurrent in various rice backgrounds making it one of the three main QTLs responsible for BLB resistance in rice. qBBR11-1, a QTL found in chromosome eleven of rice is a good candidate for defense and resistance analysis towards BLB due to the various reports on the presence of multiple resistance genes in this QTL. Through the utilisation of bioinformatics tools we (1) identified and classified defense genes to functional groups, (2) identified and classified resistance genes according to the relevant domains, and finally, (3) mapped the relationship between defense related and resistance genes to build a defense model against BLB. A total of 21 defense genes have been classified according to functional groups such as chitinase activity, transduction signal and response to stress. A total of 37 putative resistance genes have been found and classified according to related domains such as nucleotide binding sites (NBS), trans membrane domains (Trd), WRKY domains and Ser / Thr Kin without LRR. Collectively all this information is used to provide a graphical representation of the interaction between the dense and the resistance gene systems. The key genes identified may be developed into markers for breeding or used in plant transformation works.


INTRODUCTION
Rice plants play an important role in generating income for more than 200,000 farming families in Malaysia 1 . However, paddy production in Malaysia is affected by a series of biotic and abiotic stresses that results in a reduction of yield. Disease has been a factor that results in all rice growing states. The production of new cultivars that have overall or partial resistance to disease, high yields and are able to survive in a nonconducive environment is of utmost importance. Since these pathogens are constantly evolving, the cultivars do not have a long self-life of remaining resistant to these pathogens. As such farmers have to resort to application of chemical controls as well as to adopt agricultural practices that reduce losses. However, prolonged use of chemical controls in circumventing disease is not advisable as the microbes in time become tolerant or resistant not to mention the deleterious effect on the environment and the handlers. Consequently, the production of rice varieties with high levels of resistance to disease is important and thus reducing dependence on chemical controls and pesticides [1][2][3] .
Plants have several defense mechanisms for biotic or abiotic stresses which may lead to death or inhibit growth. This mechanism is known as resistance. Every plant species is attacked by at least one hundred types of mycoplasmas, viruses, bacteria, fungi, nematodes, and insects. Therefore information and understanding of the host-pathogen interaction 4 and the arsenals of gene interacting 5 in the process of defense will make selection of good candidate cultivars easier for the breeders. This information will be used for the development of molecular markers that can be used in screening varieties for resistance to bacterial leaf blight 1 .
Xanthomonas oryzae pv. oryzae (Xoo), the causative agent of bacterial leaf blight (BLB) is currently becoming a huge problem in Malaysia, China and Taiwan where paddy fields are reporting large scale losses from this disease 2,3,6 . Furthermore, chemical control of this disease is ineffective in this monsoon climate region. The cultivation of resistant rice varieties has been proposed to be the most effective strategy to prevent BLB as this disease affects the quality of rice. Based on the study by Niño Liu et al. (2006), the production and breeding of resistant varieties that carry the resistance gene is the most effective and cost-effective approach to controlling the disease 3 . Many efforts have been directed to generate rice varieties that have high resistance to Xoo. Gen Xa4 provides high resistance to Xoo in various commercial rice varieties and has been widely used in rice programmes in Asia for more than a decade 2 . Extensive planting of long-term single-strain gene varieties carrying Xa3 gene in Japonica rice varieties and Xa4 gene in Indica rice varieties have demonstrated the discovery of virulent and dominant strains capable of overcoming disease resistance afforded by these genes 6,7 .
A total of 40 antimicrobial genes have been identified and most have been well-mapped and cloned 8,9 . According to Yasmin et al. (2017), a cross between Teqing and Lemont identified a QTL, qBBR11-1 which had exhibited two consecutive years of resistance to BLB disease caused by three types of Xoo; C2, C4, and C5 10 . A study by Shanti et al. (2010) using the same rice varieties Teqing and Lemont show that qBBR11-1 has Xa4 which is the main gene against two types of Xoo; C2 and C4 11 . Arunakumari et al. (2016), reported the presence of the Xa21 gene in crosses of Improved Samba Mahsuri (ISM) with MTU1010 12 . Among all the resistance genes that have been studied so far, the dominant gene Xa21 is derived from wild rice, Oryza longistaminata, which provides broad spectrum resistance against Xoo strains in India 10 . Wang et al. (2012), concluded in their report that qBBR11-1 is among the three QTLs that have resistance to BLB 13 . Gustave et al. (2011), in his work concerning crosses between IR4 and Azucena showed the presence of qBBR11-1 which is effective on all types of Xoo strains from Africa 14 . This therefore indicates that this particular QTL is important for resistance to BLB.
In this study we conducted a systematic analysis of the genome sequence contained within this QTL. The genes were annotated and differentiated to resistance and defense related genes and these genes were then used to predict the interaction of these genes and how these genes collectively contribute to BLB resistance in rice. Key genes identified may be utilised in rice transformation, and marker development for rice improvement. All the above analysis will be conducted using the rice genome database and all available online bioinformatic tools. The details from this study and many similar QTL analysis will be utilised by our research group to develop an integrated map that may be utilised by breeders and scientists.

METHODS
The identification of the physical position of qBBR11-1 qBBR11-1 QTL position was determined using BLAST software (Basic Local Alignment Search Tool) from the National Center of Biotechnology server (NCBI) using flanking markers of qBBR11-1 i.e. ID111117 and RM6293. Left and right sequences of markers ID111117 and RM6293 were found from previous studies 6 . The range for left and right sequence for molecular markers ID111117 and RM6293 was displayed. Range is included in the search field on the Rice Genome Browser under MSU Rice Genome Annotation Project Database and Resource that provides information on annotated data for the genome of rice (rice.plantbiology.msu. edu/). This site provides sequence of genomes for the Nipponbare subspecies and has annotated sequence for 12 chromosomes of rice. This data is retrievable via the search page and Rice Genome Browser provides an integrated view for annotated data. The physical area of qBBR11-1 was displayed on the server. With that view, locus id located at the forefront and locus ID located at the back of qBBR11-1 is obtained. Then the cDNA sequence in FASTA-format for chromosome eleven was obtained from the Rice web site Genome Browser under the MSU-Rice Genome Annotation Project Database and all the sequence information between the markers were downloaded and stored in 'Text Document (*. txt') format.

Bioinformatic analysis using Blast2GO
The files with the FASTA sequences of qBBR11-1 was uploaded into Blast2GO (https:// www.blast2go.com/) for further analysis. BLAST obtained the descriptional annotations for a total of 214 genes in qBBR11-1 (Blast2GO > blast). We then proceeded with mapping and annotation that provided gene ontologies which allowed for the identification of genes that are reposible for function associated with defence and resistance (Blast2GO to mapping to annotation). A function known as enzyme coding was utilised to retrieve the enzyme annotation for the genes (Blast2GO to Analysis to Enzyme Code and KEGG to Run GO-EnzymeCode Mapping). This was followed by domain analysis conducted for each annotated gene to identify the presence of domains that are prevalent in R-genes (Blast2GO and Run InterProScan). Finally, a directed acyclic graph that explains the interconnecting pathways between the defence processes and resistance was mapped (Blast2GO to Graphs to Make GO Graph).

RESULTS AND DISCUSSION
Overall gene distribution in qBBR11-1 qBBR11-1 spans approximately 1.46Mbp region in chromosome 11. Along this region, about 214 genes were localised with an average distribution of 147 genes per Mbp. The BLAST analysis via Blast2GO annotated 209 genes with description and the remaining 5 genes without any description. The data pertaining all the genes in qBBR11-1 is provided in supplementary table (Table  S1:https://drive.google.com/open?id=1gpBEm8X zcawHvdU0jYQAvyy7aBuYkXbZ).
Based on this, it is observed that the copy number of transposable elements (TE) (53 genes) is much higher than other group of genes. Previous studies have confirmed the existence and abundance of transposable elements in plant genome [13][14] . While TE plays an important role in altering and shaping the plant genome, the exact mechanism behind the mode of action of TE genome plasticity and structure remains obscure. However, several studies revealed the possible conditions that may prompt TE insertion. A recent study suggested the alteration of genome expression by TE is in accordance with the imposition of stress on plants 9 . This suggests that the insertion of TE may drive the expansion of R-genes 15,16 .

Characterisation of R-genes and PRR in qBBR11-1
A total of 37 (0.17%) R-genes and pattern recognition receptor (PRR) found in qBBR11-1 are listed in Table 1. R-genes are essential component of plant armoury against pathogen invasion. These R-genes are characterised by the presence of domains such as the well-known NBS-LRR, protein kinase and other domain arrangements, which have been used to classify resistance genes into eight different classes 17 .

Journal of Pure and Applied Microbiology
About 15 genes encoding disease resistance protein RPM1, RPP13 and RGA3 have NBS (nuclear binding site) and LRR (Leucine rich repeat) domains and fall into Class 1 or Class 2 R-genes. The presence of TIR/CC domain could not be traced using InterProScan hence these Protein kinase LOC_Os11g46950 Protein kinase LOC_Os11g47140 Protein kinase LOC_Os11g47150 Protein kinase LOC_Os11g47110 Protein kinase wall-associated receptor LOC_Os11g46870 Protein kinase kinase 3-like wall-associated protein LOC_Os11g46900 Protein kinase kinase genes could not be distinguished between class 1 and class 2. These genes may be involved in elicitor triggered immunity (ETI) as 13 NBS/LRR domain containing genes related to bacterial blight resistance protein XA26, probable LRR receptorlike serine/threonine-protein kinase At3g47570 and LRR were also found in this QTL These genes are from Class 4 R-genes as they constitute genes with protein kinase domains and LRR repeats. Xa26 is a R-gene that provides resistance against Xanthomonas oryzae and has LRR and receptorlike kinase protein. Five annotated genes fall into Class 4 R-gene and three genes in qBBR11-1-1were associated to wall-associated kinase (WAK). WAK belongs to receptor-like kinase (RLK), which plays a pivotal role in plant-triggered immunity (PTI) for pathogen recognition. RLK is a PRR, which is categorised under Class 8 R-gene. In addition, six genes annotated as major disease resistance protein Xa4 were also included in this class.
Host plants defend themselves by several ways depending on the predator or pathogen. For microorganisms such as bacteria and fungus, host plant will first recognise the pathogen through a pathogen associated molecular pattern (PAMP), which is perceived by PRR such as RLK. Upon pathogen recognition, PTI will be activated to suppress pathogen infection. If PTI becomes unsuccessful, the pathogens will release their effectors to penetrate through the host plant. In this situation however the pathogen releases elicitors that will trigger ETI that is able to reduce the detrimental effect on the host. This arms race between the pathogen and host will continue as both coevolve in nature 17 .

Characterisation of defense-associated genes and regulatory genes in qBBR11-1
To execute the downstream defense mechanism upon pathogen recognition, plant employs a myriad of defense-associated genes encoding pathogenesis-related proteins, enzymes, secondary metabolites, antimicrobial compounds, phytohormones and regulatory genes. As such, we are interested to investigate the genes responsible for defense against bacterial blight in qBBR11-1.
Based on the Blast2GO analysis, about 21 (0.1 %) genes are linked to processes and functions relevant to defense which is shown in Table 2 and  Supplementary Table S1.
Eleven PR-8 protein related genes namely xylanase inhibitor 1 and 2 were arranged in tandem repeats in qBBR11-1. As the name suggests, xylanase inhibitor proteins are secreted by plants to inhibit the hydrolytic activity of fungal and bacterial enzymes like xylanase from Fig. 1. The breakdown of R-genes in qBBR11-1 a) the number of -genes for each description b) the distribution of R-genes according to different R-genes classes degrading the plant cell wall 18,19 . A few studies reported that these group of proteins were induced by the signals related to plant stress imposed by pathogen 20 and delays the disease symptoms 21 . Two genes found in qBBR11-1 are related to osmotin-like protein (OLP). OLP, from PR-5 protein family, plays a critical part in osmotic stress tolerance and defense against fungal and bacterial pathogens. The overexpression of OLP gene in transgenic sesame plants resulted in better tolerance against Macrophomina phaseolina which is mediated through SA and JA/ET signaling pathways 22 .
Aside from that, a single gene encoding plant cell wall component known as hydroxyproline-rich glycoprotein-like (HRGP) is also observed in qBBR11-1. HRGP are involved in plant cell wall fortification 23 . In order to penetrate the rigid cell wall of plant cells, pathogens secrete hydrolytic enzymes such as pectinase, xylanase and polygalacturonases to degrade the polymers of cell wall 23 . To further strengthen the plant cell wall, HRGP initiates the polymerisation of lignin and reorganises the cell wall architecture in response to pathogen attack 24 . Two genes encoding a regulatory enzyme known as E3 ubiquitin ligase were also found in qBBR11-1. The ubiquitination activity by E3 ubiquitin ligases is said to regulate the plant immune system 25 . In conjunction with that, these groups of enzymes were reported to modulate pathogen recognition by targeting PRR and R-genes. Specifically, in rice, the interaction of E3 ubiquitin ligase with receptor like kinase, a PRR, conferred resistance against bacterial blight 19,21 . In addition, subtilase-like protein, a proteolytic enzyme found in qBBR11-1, was proposed to play a role as membrane receptor to activate downstream signaling cascades pertaining defense against pathogens. Besides, Vartapetian et al. (2011) reviewed that they are involved in programmed cell death (PCD) that occurs as a consequence of hypersensitivity response 27 . PCD is manifested as necrotic flecks that restrict pathogens from spreading infections to other parts of the plants 27 . Three genes associated to Myb and WRKY transcription factors may be involved in the regulation of the defense response [28][29][30] . WRKY41, for example was implicated to be overexpressed in Arabidopsis thaliana upon recognising flagellin to enhance the resistance against Pseudomonas syringae which suggest the involvement of WRKY41 in ETI 22 . Likewise, Myb108 has a similar function in A. thaliana 30 .
A gene related to riboflavin biosynthesis was also identified in this QTL which may be associated to plant defense. Riboflavin is generally regarded as vitamin B2 and is well known for its benefits in terms of nutritional value to humans. Interestingly, riboflavin is also required in plant defense response. In A. thaliana, PR genes are activated by riboflavin to induce systemic acquired resistance against the pathogen 31 .
To put it briefly, the signal from R-genes will be perceived and transmitted by subtilaselike protein followed by the accumulation of PR proteins at the site of infection which results in anti-bacterial activities that either inhibits the bacterial enzyme that degrades the host cell (PR-8) or directly involve in defense against bacterium The host cell wall will be further reinforced by HRGP to prevent the penetration of the pathogen into the cell machinery. On top of that, SAR will be activated following riboflavin biosynthesis. Towards the end of the defense process, transcription factors (WRKY and MYB) and regulatory enzyme such as E3 ubiquitin ligases will regulate the expression of defense-associated genes [18][19][20][21][22][23][24][25][26][27][28][29][30][31] . Proposed mechanism for resistance against bacterial blight As depicted in Figure 3, the plant defense system composed of several biological processes. Primarily the defense mechanism in qBBR11-1 will be established once PRR recognises the pathogen and sends the signal to downstream genes to execute PTI. However, if the pathogen is able to overcome this, the host plant will execute ETI with the aid of R-genes once it recognises the pathogen. R-genes will then send the signalling cascades to defense associated genes for downstream and specific defense responses.
One of the vital phytohormone required in plant stress signalling is salicylic acid (SA) which will be activated upon pathogen recognition. SA signalling molecules mediates the accumulation of PR proteins. Xylanase inhibitors 1 and 2, PR-8 proteins carry out chitin catabolic process to inhibit the xylanases produced by pathogen to degrade plant cell wall [18][19][20][21] . OLP, on the other hand is directly involved in defense response to bacterium 22 .
Riboflavin biosynthetic process supplements the SA signalling to the induce SAR 31 . Besides, the signals from R-genes will also induce cell wall organisation by HRGP which adds mechanical barriers preventing the pathogen from entering the cell of the host plant [23][24] . Regulation of defense-associated genes remains as an integral part of defense. Protein ubiquitination by E3 ubiquitin ligase fine-rhythms the defenseassociated processes 25,26 . This is also accompanied by regulation of transcription by WRKY and MYB TF [28][29][30] .

CONCLUSION
Overall qBBR11-1 has a good combination of resistance and defense related genes that makes it a suitable target for use in breeding. In concordance with this, it is proposed that qBBR11-1 should be fine-mapped to eliminate genes that are not of interest to reduce unnecessary drag effect during breeding. A total of 37 (0.17%) R-genes and PRR have been identified in qBBR11-1. Out of these genes about 21 (0.1 %) are linked to processes and functions relevant to defense in rice. The eleven PR-8 protein related genes, namely xylanase inhibitor 1 and 2 in qBBR11-1 may be utilised in various molecular techniques such as cloning to develop resistant cultivar. The characterisation of the genes provided a rough outline of the resistance mechanism in this QTL which starts off with pathogen recognition by R-genes and subsequent signal transduction through several pathways such as SA and JA for the activation of downstream defense processes. It is therefore suggested that following fine mapping of this QTL, other relevant disease related QTLs should be identified and mapped to provide an integrated map that may be utilised by breeding programmes worldwide. Besides, this QTL can be used together with other well characterised QTLs such as qShb 9-2 32 , qSBR11-1 33 , qBFR4 and qLBL5 34 in breeding program that involve QTL pyramiding 35 to develop varieties with multiple disease resistance.