Mutations and Epidemiology of SARS-CoV-2 Compared to Selected Corona Viruses during the First Six Months of the COVID-19 Pandemic: A Review

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of coronavirus (CoV) disease 2019 (COVID-19). This study compared the genome, mutations, and infectivity/ transmissibility of SARS-CoV-2 with selected betacoronaviruses (beta-CoVs). This study further examined the origin, risk factors, and outbreaks caused by beta-CoVs. We searched the following databases for relevant studies: PubMed, Google Scholar, and the World Health Organization COVID-19 database. A close relationship between SARS-CoV-2 and SARS bat-like CoV RaTG13 (98.9%) was found at the amino acid level, followed by pangolin CoVs. Non-synonymous mutations occur at high frequencies in the open reading frame (ORF) 1ab, spike (S) protein, and nucleocapsid. Mutations P323L and D614G in the RNA-dependent RNA polymerase (RdRp) and S protein, respectively, occur at a high frequency globally. Mutations at position 3037 in the nonstructural protein (Nsp) 3, 14408 (RdRp), and 23403 (S) confer transmissibility to SARS-CoV-2. SARS-CoV-2 has higher infectivity and transmissibility than SARS-CoV, which shares the same receptor. Although bats are confirmed reservoirs, intermediate hosts are currently unknown. Smoking, old age, diabetes, cardiovascular diseases, and hypertension have all been associated with COVID-19. Within six months of its outbreak, COVID-19 was reported in all countries worldwide, whereas SARS was reported in 28 countries and Middle East respiratory syndrome (MERS) in 5 countries. However, the fatality rate of MERS (65%) was higher than that of COVID-19 (4.9%) and SARS (6.6%). Identifying the SARS-CoV-2 intermediate hosts will help prevent future outbreaks. Attention should be given to the pangolin CoVs. Variations in the S gene may confer transmissibility and infectivity.


METHODS
Online databases, including PubMed, Google Scholar, and the WHO COVID-19 database, were searched for articles containing the phrase "SARS-CoV-2" and any of the following terms: "genome", "mutations", "epidemiology", "infectivity", "Spike protein" or "host". Additional search terms identified in the first round of screening were used to conduct subsequent searches in the databases. This included, "nCoV-2019" and "genome", "risk factors", "intermediate host" or " Spike protein. The articles obtained in each search were reviewed based on their titles and abstracts for relevance. The SARS-CoV-2 articles were published in January 2020. The WHO situation reports on SARS and MERS outbreaks and a systematic review of CoVs were also included.
CoVs use S glycoproteins to gain entry into potential host cell 8,11 . The S1 region of the S protein allows binding where as the S2 region enhances fusion. Thus, the S protein determines host tropism 8,12 . Peptide insertion (PRRA) in the S1/S2 junction ofthe S protein (position 681-684) induces a furin cleavage motif (RRAR) and is unique to SARS-CoV-2 13,14 . M functions include nutrient transport, bud release, and the formation of envelope (E). In addition, together with the E protein, they participate in the assembly of viral particles. The N protein catalyzes RNA synthesis 5,15 . The SARS-CoV-2 genome also has eight accessory proteins: ORF 3a, 3b, 6, 7a, 7b, 8a, 9b, and 14, which are common to other CoVs 5,16 .

Journal of Pure and Applied Microbiology
The RdRp gene of SARS-CoV-2 is extremely similar (96%) to that of SARS-CoV.
We calculated the percent average of amino acid similarities of the whole genome, ORF 1ab, S, M, E, and N proteins of several CoVs to SARS-CoV-2. In case of a range, we used the highest value. Based on amino acid similarities of the entire genome and several regions ( Table 1), a remarkably close relationship was observed between SARS-CoV-2 and bat-CoV RaTG13 (98.9%). This was followed by pangolin CoVs (96.2%), and SARS-CoV (85.2%). We also observed a distant relationship between SARS-CoV-2and MERS-CoV genomes based on aa similarities (40.4%) of the whole genome, ORF 1ab, S, E, M, and N.

Mutations present on the SARS-CoV-2 genome
Similar to other RNA viruses, mutations in CoVs arise during every replication cycle 4 . Mutations in viruses can modulate virulence, transmission, receptor affinity, and host tropism among others 6,25 . Interestingly, in the order Nidovirales (to which Coronaviridae belongs) RNA polymerase has a proof reading mechanism. Nsp 14 in SARS-CoV has a proof reading function that increases the fidelity of RNA synthesis during exonuclease activity. It is expected to carry out the same function in SARS-CoV-2 correcting errors by RdRp 26 . Thousands of mutations, ranging from synonymous to non-synonymous, insertions, and deletions, have been reported in more than 50,000 genome sequences that were uploaded in the Global Initiative on Sharing all Influenza Data and the NCBI database. Approximately 80% of the reported mutations were found to be nonsynonymous 27 .
We reviewed studies that compared 10 Chinese isolate sequences to more than 1,000 sequences that have been uploaded from around the world. Mutations have been found to occur at high frequencies in the ORF 1ab, S, and N genes,as well as ORF3a 6,28,29 . No mutations were detected in the Chinese isolates in December 2019 4 . The first non-synonymous mutation to be reported was in position 28151 in ORF 8, resulting in serine and leucine (S and L) clade. Position 28151 within ORF 8 is characterized by a cytosine or uracil that codes fora serine/leucine (S/L) variation at aa 84. Amino acid 84 is not conserved among the other CoVs. The S-Clade was the original virus but it became less prevalent in course of time (approximately 30%). The prevalence of the L-Clade was approximately 70% early in the outbreak but its frequency decreased in late January 2020. It was more aggressive than the S-clade 5,18,30 . The SARS-CoV-2 S-clade is characterized by a mutation at position 28151. The strains isolated in North America and Europe were of the L-variant descent 26,29 .
The other significant mutation was located at position 14408 of the RdRp gene. It is anticipated to have appeared first in Italy on February 20, 2020, resulting in an aa change P323L.At this time, the number of individuals contracting the virus increased substantially in Europe 26,31 .Viruses with the P14408L mutation have a higher mutation rate (median of three point mutations) than those without the mutation, with a median of one point mutation 26 . Clade G of SARS-CoV-2 is based on a mutation in the S gene at position 23403, resulting in D614G and became the dorminant circulating variant as of June 2020. Clade V is based on a variation in position 26143 which resulted in aa change G251V in ORF 3a 6,29 . Subclades G.1 and G.2 are the result of the following mutations: G204R and R203K in N gene,P214L in ORF1b, and Q57H  6,29,33 . Two deletions of three nucleotides and 24nt in the ORF 1ab and one deletion of 10nt at the 32 end of the genome were observed in sequences of SARS-CoV-2 obtained from Aichi, Japan; Wisconsin,USA; and Victoria, Australia 34 . Table 2 summarizes the mutations that occur in different regions of the SARS-CoV-2 genome. The majority of mutations have been reported in ORF 1ab (n=59), S (n =32), and N (n=14). Figure 1 shows the logarithm (Log 10 ) of mutations occurring in regions with more than five mutations.

SARS-CoV-2 versus SARS-CoV infectivity
The S glycoprotein of CoVs is crucial for infection and stability, and determines host tropism,as well as the transmission capacity of the virus 4,8 . The S protein attaches to the receptor cells, allowing an entry of the virus 12,35 . The S gene has a subunit, S1, which has a receptor-binding domain (RBD) that binds to angiotensin-converting enzyme 2 (ACE2), allowing the S2 domain to fuse  T266I, H286Y, P287T, P308S, P 309S , P314L,  P323L, G392D, S 428N , P504L, C541Y, T609I, T708I,  I739V, P765S, H819Y, A876T, A1043V, A1176V, V1397I,  L1599F, A1606T, I1607V, M2194T , L2235I, I2244T,  G2251S, A2345 V, G2534V, D2579A, N2708S, N2894D,  F2908 I , T3058I, F3071Y, S3099L, G3334S, L3606F,  E3764 D, N3833K , L3691F, S4396L, W5308C, T5579I,  I6074V, I 6075 T, P6083 L, F6309Y , E6565D, K6958R, D7018N  Spike  21563-25384  N=32 Y28N, T29I, F32I, H48Y, H49, L54F, N74K  More RBD residues are involved in binding to the receptor,and they have a longer capping loop in SARS-CoV-2 than in SARS-CoV. These structural characteristics are associated with an increased affinity for binding 8 . The SARS-CoV-2 S protein binds to ACE2 with significantly higher affinity than that of SARS-CoV. Therefore, it is more efficient at infecting human cells and hence, rapidly spreads in the human population 8,36,38,39 . The binding free energy between RBD-ACE2 in SARS-CoV-2 is lower than that of SARS-CoV and explains why it is more infectious than the latter. This indicates that it has increased stability and can withstand high temperatures 8,35 . The SARS-CoV-2 RBD domain has higher solubility than the SARS-CoV domain, enhancing binding to the ACE2 receptor and contributing to its high infectivity 8 . Besides SARS-CoV-2, the rate of protein synthesis is higher than that of HCoVs due to the overall ratios of human slow codons/di-codons present, thereby increasing the transmissibility rate. Fast-replicating viruses have a high chance of survival in the environment and have successful infection 31 .
The mutations that occur in SARS-CoV-2 residues, responsible for cross-species transmission, could be more favorable than those in SARS-CoV 31,36 . Peptide insertion at the S1/ S2 boundary has been attributed to SARS-CoV-2 infectivity 14,16 . A previous study 40 reported that the S protein of SARS-CoV-2 weakly binds to its receptor, unlike SARS-CoV, justifying its severity only in immuno compromised individuals. However, this does not explain the extensive spread of SARS-CoV-2 infection within a short duration. This evidence is also supported by the trends of SARS-CoV-2 transmission, with more than 10 million people being infected within a period of six months since its outbreak.

Intermediate host
COVID-19 was first reported in nine patients with pneumonia in China in December 2019. Of the nine patients, eight had visited the Huanan wet market in Wuhan City and the other patient lived close to the market before the outbreak 4,41 . Huanan wet markets sell different wild animal species, including marmots, pangolins, snakes, leopard cats, bamboo rats, badgers, and hedgehogs. All of these species are susceptible to CoVs. SARS-CoV-2 is suspected to have originated from bats as it shares a 87%-96% whole genome sequence identity with batCoVs, such as RaTG13, SL-CoVZC45, and SL-CoVZXC21 11,20,22 . Based on the analysis of complete genomic sequences, bats are reservoirs of approximately 30 CoVs. However, during the out break period, no bats were being sold as most species had hibernated. Therefore, another organism was likely to be an intermediate host 4 .
Ji et al. 42 reported that snakes of the species Bungarus multicinctus and Naja atra are potential intermediate hosts. However, this was refuted by another study that showed that the relative synonymous codon usage distance was smaller between SARS-CoV-2, and a frog species than these snake species 23 . However, they did not suggest frogs as the intermediate host but further argued that the intermediate host would most likely be a mammal. Turtles were also suspected to be a possible intermediate because of the observed interaction between RBD key aa and their ACE2. In addition, they were reported to be more common in the market 12 . However, more research is required to determine this possibility.

Risk factors
Several sociodemographic characteristics were associated with COVID-19. SARS-CoV-2 infects all age groups with a median age of 45-56 years. Patients aged above 45 years respond poorly to treatment; hence, they have increased fatalities 5,46 . COVID-19 is also more prevalent in men than in women,with a mortality rate 2.4 times higher among men than among women 47-49 . The high incidence observed among men could be a result of higher prevalence of smoking and movement, including international travel,compared to women in many cultures.
Behavioral characteristics, such as human-animal contact, poor hygiene and sanitation, and wet markets increase the risk of zoonotic diseases 50 . Smoking increases the susceptibility and death rate of SARS-CoV-2 infection 5,51 . ACE2 expression levels in long-term smokers are up regulated 51 . Normally, ACE2 is upregulated to protect the host against acute lung injury. The levels of ACE2 do not vary by sex or age. Chronic smoking triggers ACE2 secretory cell expansion, which could explain the vulnerability as SARS-CoV-2 uses the ACE2 receptor 51,52 .
A temperature range of 13-24°C, precipitation of less than 30mM/month, and humidity of 50%-80% favors SARS-CoV-2 infection. Lower rainfall reduces relative humidity and provides favorable conditions for the spread of pathogens causing respiratory infections 53 . Several studies support this hypothesis, 54 reporting that an increase in temperature and absolute humidity is associated with decreased mortality after observing weather conditions in relation to the death rate for one month. The RBD-ACE2 complex results in a high entropy penalty owing to the flexibility of the RBD near the binding site. This means that it is temperature-sensitive regarding infections in humans. Therefore, an increase in temperature decreases the rate of infection, and can therefore be controlled 8 . The increased fatality rate of SARS-CoV-2 is correlated with the levels of particulate matter (PM) pollution (PM 2.5 , PM 10 ). PM affects the outcome of respiratory diseases 55 .
Underlying conditions that worsen COVID-19 outcomes include diabetes mellitus, hypertension, and cardiovascular diseases 35,39,46 . Among patients with diabetes and cardiovascular diseases, ACE2 plays a protective role.Unfortunately, SARS-CoV-2 downregulates ACE2 protein, resulting in severe clinical outcomes in these patients. Both, patients with respiratory diseases,as well as healthy individuals, are susceptible to SARS-CoV-2 infection 51 . Both healthy individuals and those with chronic respiratory diseases showed similar ACE2 expression levels 51 . The recurrence of positive SARS-CoV-2 RNA predictors includes high levels of IL-6, elevated lymphocyte counts, and lung consolidation features upon hospital admission 11,39,51 . The transmission of SARS-CoV-2 in countries with malaria incidence is limited 56 . This could be a result of the wide use of antimalarial drugs, especially chloroquine. Chloroquine is a broad-spectrum antiviral agent that increases endosomal pH, which is essential for enhancing the fusion of the virus and host cells. It inhibits the uncoating and glycosylation of many viruses. Chloroquine in combination with other antiviral agents was reported to be effective in controlling COVID-19 56 fatalities. This is more than 1,000 times as many people that contracted SARS and MERS and died during the six months after their outbreaks (Table 3). Of the three outbreaks, COVID-19 is the most extensively spread and was reported in over 200 countries within the first six months of its emergence; in contrast, MERS and SARS were reported in 5 and 28 countries, respectively. However, the COVID-19 mortality rate within the first six months of the outbreak was 4.9%, indicating that it is less aggressive than SARS (6.6%) and MERS (65%). The etiological agents for these outbreaks have been linked to bats with different intermediate hosts.

CONCLuSION
We reviewed studies that reported the genomes, mutations and epidemiology of the pathogen responsible for COVID-19 and the number of selected CoVs. The SARS-CoV-2 genomeis highly similar to the strain RaTG13 of bat origin. This was followed by pangolins based on amino acid similarities of the whole genome, S, M, E, and N. Mutations occurred at a high frequency in important proteins, namely the S, N, and replicase. ORFs 6, 7, and 10 were highly conserved. Insights into the sequence variations, especially important genes, such as the S glycoprotein, are important for understanding the biology of SARS-CoV-2 infection. This knowledge could be useful for antiviral treatment and vaccine development. Although SARS-CoV-2 and SARS-CoV use the same receptor in humans, the former has high binding affinity to the receptor. This results in high infectivity and transmissibility rates. Despite evidence of a close relationship between pangolins and SARS-CoV-2 in most studies, there remains an uncertainty regarding their identification as intermediate hosts. The risk factors for COVID-19 have been defined and are similar to those of other respiratory diseases; smoking and air pollution have been associated with diseasese verity. However, it has not been associated with cold weather, and this could partly explain the mortality variations in different regions. SARS-CoV-2 is highly transmissible and infectious, with a low mortality rate. We recommend further research to identify SARS-CoV-2 intermediate hosts to avoid future spillover, and to establish the significance of different mutations.