Molecular Characterization and Amino Acid Homology of Nucleocapsid (N) Protein in SARS-CoV-1, SARS- CoV-2, MERS-CoV, and Bat Coronavirus

Coronavirus disease – 2019 (COVID-19) pandemic, due to severe acute respiratory syndrome– coronavirus-2 (SARS-CoV-2), is posing a severe bio threat to the entire world. Nucleocapsids of SARSCoV-2 and the related viruses were studied for gene and amino acid sequence homologies. In this study, we established similarities and differences in nucleocapsids in SARS-CoV-2, severe acute respiratory syndrome – coronavirus-1 (SARS-CoV-1), bat coronavirus (bat-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). We conducted a detailed analysis of the nucleocapsid protein amino acid and gene sequence encoding it, found in various coronavirus strains. After thoroughly screening the different nucleocapsids, we observed a close molecular homology between SARSCoV-1 and SARS-CoV-2. More than 95% sequence similarity was observed between the two SARSCoV strains. Bat-CoV and SARS-CoV-2 showed 92% sequence similarity. MERS-CoV and SARS-CoV-2 nucleocapsid analysis indicated only 65% identity. Molecular characterization of nucleocapsids from various coronaviruses revealed that SARS-CoV 2 is more related to SARS-CoV 1 and bat-CoV. SARS-CoV 2 exhibited less resemblance with MERS-CoV. SARS-CoV 2 showed less similarity to MERS-CoV. Thus, either SARS-CoV-1 or bat-CoV may be the source of SARS-CoV-2 evolution. Moreover, the existing differences in nucleocapsid molecular structures in SARS-CoV-2 make this virus more virulent and highly infectious, which means that the non-identical SARS-CoV-2 genes (which are absent in SARSCoV-1 and bat-CoV) are responsible for COVID-19 severity. We observed that SARS-CoV-2 nucleocapsid from different locations varied in amino acid sequences. This revealed that there are many SARS-CoV-2 subtypes/subsets currently circulating globally. This study will help to develop antiviral vaccine and drugs, study viral replication and immunopathogenesis, and synthesize monoclonal antibodies that can be used for precise COVID-19 diagnosis, without false-positive/false-negative results.


INTRODUCTION
Coronaviruses are a relatively large (125 nm), diverse group of RNA viruses infecting many mammals, amphibians, reptiles, and birds 1,2 . Coronavirus contains a positive-sense RNA (+ RNA) genome that directly acts as messenger RNA (mRNA) during viral replication inside the host cell 3 . Coronaviruses are spherical particles with a viral envelope, which is acquired from host cytoplasmic membrane 4 . Although coronaviruses are enveloped, the envelope is very sensitive to temperature, chemicals, disinfectants, soaps, and sanitizers 5 . The viral envelope is crucial for host cell attachment, leading to further replication steps 6 . If the viral envelope is disintegrated, coronaviruses cannot proceed with replication 7 .
On the viral envelope surface, there are spike glycoproteins, which are very important for attachment to host cell receptors 8 . Severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) viral spike glycoproteins attach to angiotensinconverting enzyme-2 (ACE-2) receptors on pneumocytes 9 . ACE-2 receptors are located in the lungs, heart, arteries, kidneys, and intestines 10 . Recent case studies have revealed that many patients with coronavirus disease-2019 (COVID- 19) have also died of myocardial infarction, stroke, and renal failure [11][12][13] . This suggests that after viremia, SARS-CoV-2, in addition to severe lung infection, can also infect other body sites. Inside the viral envelope, coronavirus possesses a nucleocapsid (N) protein, which is very important for its replication, antigenesis, pathogenesis, virulence, infection and, dissemination 14 . This N protein is helically symmetrical with RNA coiling around it 15 . All heredity information for viral spike protein synthesis, replication mechanism, other viral encoded protein synthesis, host cell docking and damage, and disease progression are encoded by the viral RNA in N protein 16,17 . Coronavirus N protein determines all aspects of viral pathogenesis. In our study, we attempted to characterize the N protein. The N proteins present in other coronaviruses were also compared during this study. We analyzed the molecular composition and amino acid sequence of N present in SARS-CoV-2 and other related viruses like SARS-CoV-1, bat-CoV, and Middle East respiratory syndrome-coronavirus (MERS-CoV). This will help virologists to study viral replication, viral-host immunopathogenesis, and vaccine development to combat the current COVID-19 pandemic. To date, the COVID-19 pandemic has resulted in more than 3,672,238 cases and 254,045 deaths worldwide (WHO-Situation Report-108). This figure is rapidly increasing worldwide, suggesting that SARS-CoV-2 is rapidly disseminating among communities. The rapid infection rate may be attributed to the low virus infectious dose required to elicit infection (0.01 PFU/ml) in an individual 18 . Reports suggest that temperature variation in different countries also plays an important role in the spread of COVID-19 19,20 . Other studies speculate that individuals already immunized with the BCG vaccine have protection against this disease 21,22 ; however, there is no effective vaccine available for COVID-19. There are also no antiviral drugs available for COVID-19. Some scientists are empirically employing antimalarial drugs like hydroxychloroquine, and antibiotics such as azithromycin, for COVID-19 [23][24][25] . The USA Food and Drug Administration (FDA) permitted the use of the antiviral drug remdesivir for COVID-19 patients. The actual antiviral effects of these drugs against COVID-19 have not been thoroughly proven 26 . In these circumstances, there is a need for a suitable killed/attenuated vaccine, which combats SARS-CoV-2, for immunoprophylaxis. If an effective COVID-19 vaccine is established, it will elicit a good immune response containing SARS-CoV-2 infection in the community after administration. To develop a vaccine and antiviral drugs against COVID-19, N protein molecular characterization is crucial. Therefore, in this study, we employed various bioinformatics tools, GenBank, and the European nucleotide archive (ENA) to compare SARS-CoV-2 N protein with other related coronaviruses. This study emphasizes and suggests ways to contain and combat COVID-19.

MATERIAlS AND METHODS Data Sources and Research Strategies
We employed bioinformatics tools and used data present in various gene banks, and amino acid sequences were analyzed using related amino acid sequence banks for specific proteins. This research concentrated on GenBank (NLM-NIH) and the European Nucleotide Archive (ENA). We also employed bioinformatics tools like BLAST analysis of N protein. N proteins of various coronaviruses like SARS-CoV-1, bat-CoV, and MERS were compared with SARS-CoV-2 (COVID- 19). After the analysis of N proteins of these coronaviruses, a phylogenetic tree was also established. Our research also carried out a thorough and detailed literature review on molecular aspects of coronaviruses, using available databases such as MEDLINE/PubMed, SCOPUS, Web of Science, ScienceDirect, and Google Scholar. We used combined MeSH (Medical Subject Headings) terms in Google Scholar such as 'COVID-19', 'Coronavirus', 'SARS-CoV-2', 'SARS-CoV-1', 'bat-CoV', 'MERS-CoV', 'Nucleocapsid protein', 'N-Protein' etc. for the search process from different databases.

Eligibility Criteria
In this study, SARS-CoV-2 was molecularly characterized. Its molecular composition was also compared with other coronaviruses such as SARS-CoV-1, bat-CoV, and MERS-CoV. Articles were excluded on the following basis: (a) insufficient or no data; (b) not having a proper study design or approach.

Tools employed
NCBI Genome WorkBench was employed for sequence processing and Clustal-X (Version 2.1) for sequence alignment. NCBI Tree Viewer was used to visualize the phylogenetic tree. Additionally, python tools were employed in JupyterLab Notebook (version 1.0) to explore SARS-CoV-2, SARS-CoV-1, bat-CoV and MERS-CoV N-protein structure, obtained from different hosts such as bats (Chiroptera), human (Homo sapiens) worldwide, rabbits (Oryctolagus cuniculus), brown rats (Rattus norvegicus), and many other possible potential SARS-Cov-2 hosts updated in NCBI till date. BioPython (version 1.76) was used for importing modules. We used PyMol software for molecular visualization of the SARS-CoV-2 N-protein structure. During this study, we employed GenBank (NLM-NIH) and ENA for gene and protein sequence analyses.

RESUlTS AND DISCUSSION
When the nucleocapsids of SARS-CoV-2, SARS-CoV-1, bat coronavirus, and MERS-CoV amino acid sequences were analyzed, we found 95% homology existing between SARS-CoV-2 and SARS-CoV-1 (Fig. 1, Fig. 2). SARS-CoV-2 and bat coronavirus nucleocapsids showed 92% amino acid sequence similarity (Fig. 1, Fig. 2). When N protein of SARS-CoV-2 and MERS-CoV were analyzed, only 65% amino acid sequence similarity was observed. Additional to the above  non-COVID-19 viruses, we have also analyzed other beta coronaviruses with different animal origins with SARS-CoV-2. For this study, we used nucleocapsid sequences of whale, rat, rabbit, and fowl coronaviruses. After analysis, we observed 45 to 55% N protein gene sequence homology between SARS-CoV-2 and these beta coronaviruses. In this study, we have established three dimensional (3D) structures of both SARS-CoV-2 (Fig. 3) and SARS-CoV-1 N proteins. This tertiary N protein structure of SARS-CoV-2 and SARS-CoV-1 indicated that these two structures, from two different virus clans, are more than 95% similar. When N proteins (3 D structures) were analyzed, we found that SARS-CoV-1 and bat coronavirus have close similarity (96%) with SARS-CoV-2. Surprisingly, we observed that SARS-CoV-2 nucleocapsids from Chinese and Italians are showing 1-2% variations. So, we speculated that SARS-CoV-2 might have its origin from SARS-CoV-1 or bat coronavirus. Otherwise, SARS-CoV-2 might have evolved from other SARS-related unidentified animal coronaviruses. Our study indicated that nucleocapsids of examined animal coronaviruses are not closely related to SARS-CoV-2. This is also confirmed by the 3D structure comparison with analyzed coronaviruses. Many researchers observed that there is SARS-CoV-2 reinfection in the same treated patients [27][28][29] . In other cases and parts of the world, COVID-19 patients exhibited renal failure, myocardial infarction, diarrhea, and stroke [30][31][32][33][34] . The COVID-19 pattern in different countries showed varying severities [35][36][37] . Many patients are asymptomatic, yet capable of transmitting the infection to others [38][39][40][41] . Clinical features of COVID-19 in different parts of the world are also showing variations [42][43][44][45] . We observed that these variations in severity, clinical features, and reinfection of SARS-CoV-2 may be attributed to the ability of the virus to mutate. In our study, we found that SARS-CoV-2 identified in different geographical locations show little variations. We speculated that this variation may be responsible for the strain/ clan/subset variations. We also presumed in our study that there may be many SARS-CoV-2 subsets circulating the world. These subsets may show different virulence, pathogenicity, and disease patterns.

CONClUSIONS
SARS-CoV-2 and SARS-CoV-1 are closely related coronaviruses, whereas MERS-CoV is less similar to SARS-CoV-2. In our analysis, we observed that bat coronavirus and some animal coronaviruses have less N homology with SARS-CoV-2. N protein is very important for virus virulence, antigenicity, pathogenicity, and clinical severity. SARS-CoV-2 may have evolved from SARS-CoV-1. We also observed that there is variation between SARS-CoV-2 identified in different parts of the world. This suggests that SARS-CoV-2 has many subsets circulating in the world.