Proteome Organization of COVID-19: Illustrating Targets for Vaccine Development

‘COVID-19’ the recent virulent viral infection had influenced the lives of millions globally leading to both loss of life, economic and financial crisis. Coronavirus belongs to family coronaviridae with four genus viz. α/b and g-coronavirus, infecting both aves and mammals. The SARS-Cov-2 emerged in Wuhan, China in Dec, 2019 and since then had spread to 213 countries. Its origin is debatable with both natural origin and conspiracy theory providing no conclusive evidences. Coronavirus have ‘+’ive RNA and encodes for 29 proteins, which carries out its life cycle including infection and disease progression. The study of its proteome organization could illustrate the proteins which act as the key molecular players in the infection cycle of the virus. These proteins can also act as important drug targets in combating COVID-19 infection. Majority of the drugs have been formulated in order to act as agonist to spike proteins inhibiting infection by binding to ACE2 receptors. Proteome analysis has also revealed the critical mutated proteins that are responsible for COVID-19 pathogenesis and virulence. mRNA based vaccines (mRNA-1273, BNT162) also targets these spike proteins. Although DNA vaccine has also been attempted using RDT, but the high rate of mutation associated with COVID-19 have made such vaccines ineffective even before use. Thus evolutionarily conserved proteins have been the best candidature for vaccine development. Similarly phylogenetic analysis of its proteins could help us to understand the evolutionary pattern of COVID-19. It could be used to develop a predictable model for such pathogenic infections, preparing ourselves to take preventive action against its reoccurrence.

In the year 2003-2004, about 8000 people infected with severe acute syndrome coronavirus (SARS-CoV) having mortality rate of about 10% [7][8][9][10] . Then in 2012, more than 1700 people become infected with Middle East respiratory syndrome coronavirus (MERS-CoV) along with the mortality rate of about 36% 11-12 . In 2013, United States was chronically suffered by porcine epidemic diarrhea coronavirus (PEDV) in piglets that cause 100% mortality; as a result their population has decreased by more than 10% within a year [13][14][15] . From all these findings, it has now become evident that coronavirus pose serious threat to human health by causing infections related to respiratory, central nervous system and gastrointestinal system along with economic losses [1][2] . It has also been reported through modern research that coronavirus has the potential to undergo mutations and recombination processes, so become capable of surviving in diverse environmental conditions along with wide host range and tissue tropism efficiency [16][17][18] . Due to these unique characteristics, coronavirus pose very stable and long term infections in humans and animals globally disabling the global healthcare system and economy.

Structural Organization
Coronaviruses have positive-stranded RNA as the genetic material along with an envelope (Fig. 1). It has been found that coronavirus posses the largest genome amongst all the RNA viruses (ranges from 27 to 32 kb). The genome of coronavirus is surrounded by an outermost envelope, inside which helical capsid is present that is made up of nucleocapsid protein (N). Three structural proteins are associated with the viral envelope viz. membrane protein (M), envelope protein (E) and spike protein (S) 19 . Among these, envelope protein (E) and membrane protein (M) are responsible for assembly of the virus, while the spike protein (S) helps in the virus entry into host cell. In addition to these three structural proteins, some coronavirus have additional envelope associated hemagglutinin-esterase protein (HE) 20 . Large protrusions emerged out from the spike of viral surface, which forms a crown like appearance, that's why given the name 'coronavirus' (Latin word means crown) ( Fig. 1 (a) & (b)). The spike of the coronavirus is not only associated with entry of virus inside the host cell but also act as a vital determinant for tissue tropism, viral host range and a key stimulator of host immune responses.
The spike of coronavirus posses three segments viz. big ectodomain, a single-pass transmembrane anchor, and a small intracellular tail (Fig. 1b). The ectodomain segment contains a S1 subunit that binds to the receptors and a S2 subunit that helps in the membrane-fusion. It has become evident from the electron microscopy that spike is in the form of clove-shaped trimer consisting of three S1 subunits as heads and a trimetric S2 subunit as stalk [21][22][23][24] (Fig. 1 (a) & (b)). During the entry of virus inside the host cell, S1 subunit binds to the receptors present on the surface of the host cell and S2 subunit merged with the membranes of host cell and viral cell, allowing the entry of viral genomes inside the host cell. The important steps in the infection cycle of coronavirus are the receptor mediated binding and membrane fusion that act as a key targets for drug development. In this review article, we have focused on the structural, functional aspect of coronavirus proteome along with the identification of key targets for vaccine development.

Prevalence and COVID-19
Molecular phylogenetic analysis across different geographical isolates has been carried out to identify viral proteins those are under positive selection pressures, i.e. accumulating mutations that enhance pathoadaptivity of virus over time. These proteins have potential to act as vaccine candidates as these also act as B cell and T cell epitopes against which host mounts an immune response.
It further helps to track the global outbreak of coronavirus pandemic. The first case of COVID-19 was observed in the Wuhan city in China, on December 2019 and has since then it had acquired the form of pandemic, disseminating in 213 countries and 26 cruise ships across the globe 10 .
The origin and evolution of COVID-19 has been speculated through two different hypotheses including both natural origin of COVID-19 and conspiracy hypothesis. The hypothesis for the natural origin of COVID-19 assumes that it is the result of natural evolution, according to conclusion published by scientists from Australia, United Kingdom and United States in Nature Medicine 25 . The investigation of data in public genome sequence from SARS-CoV-2 and MRSA viruses have not initiate any substantial evidence that the virus was constructed in the laboratory or engineered otherwise. Although the origin of the virus is still debatable but the recent mutational studies about its evolution had suggested it to be of natural origin rather then been produced through genetic engineering. Scientists in China have successfully sequenced the genome of SARS-CoV-2 and made the data has been submitted to NCBI, USA 26 .
Several independent sequencing studies on SARS-Co V-2, through phylogenetic sequence similarity plot clearly revealed that major similarity in nucleotide regions with members of subgenus Sarbecovirus [27][28][29] (Fig. 2). In accordance to the results of the phylogenetic analysis 2019 nCoV had clustered around SARS-COV, Bat-SL-CoVZ45 and bat-SL-CoVZXC21 indicating that it could be a recombinant in between the two [28][29][30] (Fig. 2).
Although with recent identified strains of the Covid-19 it is clear that the virus is capable of self evolving and crossing the host range, yet some research workers still believe that the recombination could have been possible in a BSL-4 facility using the tools and techniques of RDT [31][32][33][34] . Scientists believing in the Conspiracy hypothesis have provided circumstantial evidences for the indicating that the virus could be a potential bioweapon [31][32][33][34][35] . Firstly the origin of virus and its report occurred in an area just 30 km away from the Wuhan Institute of Virology, Chinese Academy of Sciences, located in Jiangxia District, Wuhan [31][32][33][34]36 .
Secondly since 2005 a Coronavirus research group was actively involving in research at the Wuhan Institute of Virology reported that horse shoe bats were the natural reservoir for the same 37 . In 2015 the research group also published scientific findings on infection of Homo sapiens HeLa cell lines with coronavirus [38][39] . In Feb, 2020 New York Time reported that in Feb, 2020 a patent claim was filed by the Chinese authorities for the drug remdesivir which was found to contain the infection of Corona virus rising smoke indicating dubious role of the Chinese in containing the viral infection 40 . Although considerable evidence can never be made about the virus being genetically engineered but the roles of Chinese agencies will always remain doubtful [31][32][33][34][35][36] .
Another key target protein to illustrate the evolution of COVID-19 are the spike proteins, which is used by the virus to capture and pierce the plasma membrane of animal cells. Spike protein have two important features that includes the receptor-binding domain (RBD), which helps the viral particles in attachment onto the host cells and the cleavage site which helps in viral integration into the host cell.
Recent research have identified that RBD portion of the SARS-CoV-2 spike proteins has evolved to successfully bind to a specific target protein of human cells called ACE2, a receptor involved in blood pressure regulation 41 . The binding efficiency of the virus to the human cell has also illustrated that the virus had evolved through natural evolution.
The spike protein present in SARS-CoV-2 helps in its effective binding to the human cells. However, the scientists have reached to a conclusion that it was the product of It is now confirmed that the genome of the new coronavirus is less than 30,000 nucleotide long. Scientists have identified genes coding for 29 proteins, which carries out its life cycle and are responsible for all sought of infections, progressions, disease and ultimately death. The coronavirus genome and its associated proteins encodes are depicted in Table 1. Inside the host cell the first viral protein produced is a chain of 16 proteins that are joined together. Among them two work like scissors, nicking the links between different proteins and releasing them apart from each other in order to carry out their activities including the three structural proteins.  1, 2, 3, 4, 5, 6, 7, 7a, 7b, 12, 9, 10, 8, 11, 13, 14, 15, 16 ACCESSORY PROTEINS ORF3a, ORF6, ORF7a, ORF8, ORF 9b, ORF 9c Researches on other corona viruses have indicated that some of the proteins are structural components while other serves as assessor proteins and there are also others whose function and properties are still mysterious. The key important features and function of each protein has been summarized in table 1.
The study of the spike proteins S, E, M and N has revealed that these proteins forms a timeric structure which is responsible for binding to ACE2 receptors over the pharynx cells that causes viral invasion. The genes for these spike proteins are responsible for the evolution of zoonotic corona virus crossing its host range into a disease causing virus for Homo sapiens.
The proteome analysis of the COVID19 proteins has illustrated some key important drug targets. As NSP12 protein in conjugation with NSP7 and NSP8 initiates the replication and transcription of coronavirus COVID-19. So it is the primary target of majority of drugs in order to halt replication and transcription. Recently the first FDA approved drug remdesivir has been shown to exhibit promising effect over combating COVID-19 infection 65 . Similarly, SARS-NSP13 has been identified as an ideal target for development of anti-viral drugs due to its sequence conservation and indispensability across all CoV species 56,57 .
The study of COVID-19 proteome is of special significance as it could help in the identification of critically mutated proteins responsible for its pathogenesis and virulence as compared to its non virulent corona viral strains 66 . Spike proteins (Spike S, Spike E, Spike M and Spike N) of COVID-19 are critical for the invasion of human cell as these binds to the ACE2 receptors on the pharynx cell, have been targeted as a vaccine candidate [46][47]66 . Another vaccine mRNA-1273, coding for the "spike protein" is under human clinical trials 67 , while Pfizer vaccine BNT162 is also exploring similar mRNA technology to combat COVID-19. Oxford University research project on the vaccine CHAdOx1 nCOV-19 is based upon the use of genetic engineering to ligate segments of the COVID-19 genome into a non pathogenic modified viral host, hoping to provoke an immune response in humans 68 .
Although several attempts are also made to synthesize DNA vaccines using recombinant DNA technology, but the high rate of mutations in COVID-19 would make such vaccines ineffective even before the actual treatment begins 69 . Thus the entire scientific world is left with the choice of developing a protein based vaccine in order to ensure long term protection against COVID-19.
Some other accessory proteins have also been reported that influences the pathogenesis and infection of COVID-19, these are referred to as accessory proteins. ORF3a refers to a group of accessory protein that helps in altering the internal environment of the infected cell making it effective to replicate. It causes the appearance of lessons in the membrane of the infected cell making it easier for new viruses to escape. Another protein encoding gene ORF3b overlaps the same RNA although its function is not well defined 70 . The ORF6 protein another accessory protein inhibits the signals that the infected cell would produce to the immune system for response. It inhibits the cell antiviral proteins making it more prone to viral infection.
The ORF7a accessory protein triggers an increase in viral load as it enhances movement of viral particles out of the cell. It also triggers auto suicide of the cell causing damage to alveolar tissues that could lead to death. The function of the other accessory protein ORF7b is unknown. The ORF8 mystery protein drastically varies in different corona viruses strains and is responsible for influencing viral pathogenesis. The accessory proteins ORF9b and ORF9c overlap this same stretch of RNA in COVID-19 genome. ORF9b blocks interferon and other cytokines which are an effective molecules of the immune system in combating the virus. Majority of the closed related viruses such as SARS/ MARS do not encode the genes for this protein, thus causing the protein to facilitate the infection of COVID-19.
The corona virus genome ends with a snippet of RNA that stops the cell's proteinmaking machinery. The comparative analysis of