Molecular Analysis and Genome sequencing of sARs- CoV-2 during second Wave 2021 Revealed Variant Diversity in india

SARS-CoV-2 variants rapid emergence has posed critical challenge of higher transmission and immune escape causing serious threats to control the pandemic. The present study was carried out in confirmed cases of SARS-CoV-2 patients to elucidate the prevalence of SARS-CoV-2 variant strain. We performed RT-PCR using extracted RNA from the nasopharyngeal swabs of suspected Covid-19 patients. Confirmed positive cases with CT<25 were subjected to whole-genome sequencing to track the prevalence of the virus in the Malwa region of Punjab. The presence of B.1, B.1.1.7, B.1.351, B.1.617.1, B.1.617.2, AY.1 and other unidentified variants of SARS-CoV-2 was found in the studied population. Among all the variants, B.1.1.7 (UK variant) and B.1.617.2 (delta-Indian variant) was found to be the most dominant variant in the population and was found majorly in Patiala followed by Ludhiana, SBS Nagar, Mansa and Sangrur. In addition to this, sequencing results also observed that the dominant trait was more prevalent in male population and age group 21-40 years. The B.1.1.7 and B.1.617.2 variant of SARSCoV-2 is replacing the wild type (Wuhan Strain) and emerging as the dominant variant in Punjab.


INTRODUCTION
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), earlier known as novel CoV/ nCoV lead to coronavirus pandemic in 2019. SARS-CoV-2 belongs to Coronaviridae family having single stranded '+' sense RNA causing infection in mainly humans and animals resulting in respiratory congestions. The first reported case of this virus was identified in Wuhan Hubei, China during December 2019 and spread hastily all around the world. The beginning of the outbreak in India began in January 2020, one of the medical student carrying Wuhan strain returned from Wuhan University reported positive in Kerala on 30th January 2020 1 . In Punjab it appeared during the month of March 2020 with 42 active cases. 2 Later Indian health ministry initiated the surveillance of SARS-CoV-2 with ICMR (Indian Council of Medical Research) through VRDL (Viral Research Diagnostics Laboratories).
Initially, there were seven reported species of coronavirus such as HCoV-229E, HKU1, NL63, OC43, SARS-CoV, MERS-CoV and SARS-CoV-2 of the coronavirus leading to mild and severe respiratory problems. Among these, HCoV-229E, HKU1, NL63, OC43 causes mild symptoms while these species SARS-CoV, MERS-CoV and SARS-CoV-2 are reported to cause fatal respiratory problems. 3 The viral RNA has the potential of getting mutated repeatedly so it is very crucial to understand the nature of mutation in different strains of SARS-CoV-2. 4 Scientists observed various variants of SARS-CoV-2 from different geographical regions of the world. [5][6][7][8] These variants have reported to show insertions, deletions or substitutions which might be beneficial or detrimental to the organism due to changes in the viral structural and pathogenic properties. 8 Understanding of the mutation will help to estimate viral transmission, immune escape, efficiency of replication, and virus virulence. 9 As India is the second largest infected country of SARS-CoV-2, the current study focused on Indian isolates (Punjab) to explore the variations occurring in the population. 9 Therefore, taking this into consideration we analyzed the current trend of SARS-CoV-2 emergence in India.

METHODOLOGY study Area
Punjab state is geographically located in North-West India. Post-partitioned Punjab is divided into three main regions: Majha, Malwa and Doaba. 10 The study area of our research was Malwa region, which is situated in south of river Sutlej. The present work was carried out by studying 10,22,820 symptomatic and asymptomatic individuals during second wave of Covid-19 surge from January to May 2021. The research was carried out from 5 districts of Malwa region in Punjab such as Sangrur, Mansa, Ludhiana, Patiala, and SBS Nagar. Personal demographic data such as name, age, gender, geographical location, identity proof, contact details etc. was gathered from individuals infected by Covid-19 as per the guidelines given by ICMR. sample handling R e s p i r a t o r y s p e c i m e n s i . e . , nasopharyngeal swabs in universal viral transport media were received from 5 different districts of Malwa region at Viral Research and Diagnostic Laboratory (VRDL), tertiary care hospital, Punjab. The received samples were further processed for RNA extraction and RT PCR. Then the remaining samples were stored at -80°C.

RNA Extraction
R N A w a s e x t r a c t e d f r o m t h e nasopharyngeal swabs using 200 µL of the sample. Nucleic acid extraction was conducted in 96 well plates using MagMAX™ viral pathogen nucleic acid extraction kit . RNA was eluted in 50 µL of elution buffer. The extracted genetic material was downstream for molecular detection.

Rt-PCR and sequencing
The Real Time RT-PCR Test was used for the qualitative recognition of SARS-CoV-2 nucleic acid from samples collected during January 2021 to May 2021. The Real Time-PCR test incorporates reverse transcription of viral RNA into DNA for the easy detection of the virus. A total of 9μL of the extracted RNA was subjected to RT PCR for the qualitative detection of SARS-CoV-2 RNA using Genes2Me (VIRALDTECT-II) Multiples real Time PCR kit. The PCR reaction was conducted using following thermal conditions heated to 55°C for 10 minutes for reverse transcription, denatured at 95°C for 3 minutes and then 40 cycles of amplification were carried out at 95°C for 15 seconds and 60°C for 60 seconds using reporter dyes Cy5 for N gene, Rox for RdRp gene and FAM for E gene. The results were analyzed and few positive samples with cycle threshold of <25 were packed in dry ice with triple-layer packaging and sent to National Centre for Disease Control; New Delhi (NCDC) for whole genome sequencing. We analyzed lineage according to age, gender and geographic spread in five districts using data provided by NCDC, New Delhi.

RESULTS
A serial cross-sectional study was conducted by Tertiary Care Hospital, Malwa region. In the present study 10,22,820 suspected individuals were tested for SARS-CoV-2 during the period from January 2021 to May 2021. Among these 63,293 individuals were confirmed positive by RT-PCR. Trends in the distribution of positivity as compared to total sample in second wave during January 2021 to May 2021 have been depicted in the Fig. 1. Out of these positive sample 1800 sample with cycle threshold <25 were randomly selected for whole genome sequencing as per directions given by NCDC. Among these sequencing results revealed the variants in 1762 samples and 38 samples were rejected due to poor sample quantity.
The average age of the covid-19 patients was 37.4 ± 0.5 years with age range 2-80 years with median age of 24 years and mode age of 32  in amino acids has been found in the studied population groups, many variants share similar mutations as depicted in Table 1. A total of 11 amino acid replacements /SNPs (Single nucleotide polymorphisms) and 3 deletions/ substitution were recognized. Frequency of the amino acid replacement has been shown in Fig. 2.
Prevalence of SARS-COV-2 variant: In the present study it was observed that till December 2020 wild type (Wuhan strain) was predominant as no variant was detected at that time but from January 2021 to May 2021. B.1.1.7 (UK variant) Fig2. Frequency of the amino acid replacement. These amino acid replacements were shared by many variants in the current study. Among all the frequency of P681H replacement was observed to be higher followed by N501Y, L452R, P681R and T478K. The frequency of rest of the replacements was observed negligible.  Table  2). In addition to this, sequencing results also observed that dominant trait was more prevalent in male population and age group 21-40 years.
Gender and Age wise distribution of SARS-CoV-2 variant: The SARS-CoV-2 whole genome sequencing of both the gender revealed that the B.1.1.7 (UK variant) dominates in both the gender categories (Fig. 4a). Similarly, in other variants B.1, B.1.351, B.1.617.1, wild (Wuhan strain) and unidentified variants, there is no major difference in the distribution between male and female population. The SARS-CoV-2 whole genome sequencing was also analyzed among different age groups ranging from 0-20 years, 21-40 years, 41-60 years, 61-80 years and more than 80 years. Highest numbers of confirmed positive cases were observed in younger age group 21-40 years followed by age group 41-60 years (Fig. 4b).

DISCUSSION
Covid-19 cases have continued to surge in India during second wave as new records have been made by current outbreak of SARS-CoV-2 in India. Prior to second wave less than 0.7% of population was infected but by the end of April 2021 new cases reached to 1.5 million as per Indian government report. 11 The abrupt hike in SARS-CoV-2 cases in India corresponds with high prevalence of more-transmissible variant, associated with diagnostic test failures and antibody escape. Variant B.1.1.7 has been identified in the month of September 2020 in England and has quickly been spread over to other countries including India. 13 This variant has been classified as variant of concern (VOC, 202012/1). Current study showed highest prevalence of this variant from month of January till May. In the studied population B.1.1.7 variant carrying N501Y mutation in spike protein found to have SNP (single nucleotide polymorphism) in which asparagine (N) changes to tyrosine (Y) at 501 position. This mutation leads to lower sensitivity towards immune responses as compared to the wild Wuhan variant. Vaccine trials suggest that infection with variants may offer only restricted protection from reinfection with the 501Y.V2 variant. 14 B.1.1.7 variant also has poor diagnosis ability as it has deletion in the spike protein. 15 It has been responsible for the upsurge in the mortality rate with the 1.35 fold higher probability and estimated to be 40-80% more infectious than wild type and other variants due to the increased viral load. [16][17][18] . Reports suggest that in Delhi the distribution of B.1.1.7 (VOC) was minimal (5%) and then gradually enlarged up to (60%) by the end of April 2021 19 while in Karnataka the lineage was identified to be 44.4% from both imported cases and circulating cases. 20 B.1.351 variant reported in the present study was also found in multiple countries but it was first identified in South Africa in October, 2020 and has the increase transmissibility because of the mutations in the spike gene. 21,22 B.1.351 shares identical mutations with B.1.1.7 and also seems to show reduced sensitivity to acquired immune responses against the 'wild-type' Wuhan virus. 15 Its prevalence has been reported in West Bengal with percent value of 16.9. 23 Another major variant of concern of current study was B. to as triple mutant and first recognized in India but now mounting in prevalence across the country as shown by genome sequencing data. 24 In present study also B.1.617.2 sub-lineage was the second most dominating variant from January to April. In fact all the three lineages of B.1.617 were reported to have L452R mutation. Mutation L452R was observed in the California variant having more transmissibility and viral load. 25 The mutations L452R, E484Q, and P681R found in the current study shared by lineage B.1.617.1 while the B.1.617.2 lacks the E484Q mutation. [26][27][28] Additionally, Delta plus (AY.1) variant acquired K417N mutation in the receptor binding domain of spike protein, which was first found in Beta variant. The first two cases of Delta plus variant in the study group were found in Patiala and Ludhiana district in the month of April 2021. Scientists suggest that this variant pose various issues including rapid transmissibility, antibody neutralization and reduced effectiveness of vaccine. 29 Other Q1071H mutation was also observed in the present study and also shared by B.1.617.1 lineage 30 and in triple Mutant Bengal Strain (B.1.618). Interestingly, mutations P681R and Q1071H observed in the current study were present in the B.1.618 variant along with E144K mutation. 31 The E484Q mutation is similar to E484K, a mutation found in the UK variant and South Africa variants. These mutations are suggested to cause reduction in body weight, histopathological changes in lung, lung lesions. 31 Studies have reported that the highest numbers of this variant were from India than other countries Brazil, Argentina, United States. 32 B.1.617.2 sublineage has increase from 10% to about 80% in Delhi from January to April. 19 The variant has also been recognized in several nations with higher transmissibility, pathogenicity and immune escape. 24 Interestingly, other unidentified replacements in amino acids have also been found in the present study. Among them N440K, P681H considered to be powerful having more virulence, infectious and have higher immune escape than the wild type strain. 20,33 N440K mutation in SARS-CoV-2 became the reason behind the major destruction caused by the SARS-CoV-2 in Visakhapatnam, Karnataka, Telangana and other southern parts of India. 20 N501Y mutation found in this study was also detected in Telangana, the southern parts of India. 33 Occurrence of re-infections and rapid transmission is mainly due to the emergence of all these new variants, either after natural infection or after vaccination. 24 A recent study confirmed that re-infections are already happening in India. 12 Mutations N501Y and E484K found in the present study are acquired by the B.1.1.7 lineage whereas N501Y mutation has also been shared by B.1.351 and P.1 lineages along with K417N/T and E484K mutations. The E484Q and L452R mutations observed in the current study were also present in the Indian variant B.1.617. 34,35 These variants have been associated with increased transmissibility and are more prone to immune escape. 36,7 Other unidentified mutations found in the study such as L452R, E484Q and P681R in the spike protein are possessed by the B.1.617 lineage. Additionally, these mutations are also been reported in other globally circulating lineages. 31 Most variants of the study are identified to have increased transmissibility and neutralization of antibody.

Limitations of the present study
Most of the individuals in the studied population did not give information about the international travel history during second wave of covid-19 surge. Hence, the primary source of SARS Covid-19 spread could not be detected. The primary source assumed be locality, neighbourhood, their workstation or may emerge from other countries of the world. Larger number of whole genome sequencing and geographically diverse population will be help to better characterize the associations between disease severity and variations.

CONCLUSIONS
Altogether, we can conclude that our data highlighted the increased frequency of B.1.1.7 (UK variant) and B.1.617.2 (delta Indian variant) lineage in Patiala and Ludhiana districts of Punjab. This study also revealed several unidentified deletions/ substitution in untranslated and translated regions of the SARS-CoV-2 genomes. In this wave younger population of studied area was more effected than other age groups. The mutations found in the study were globally circulated and were more virulent and dangerous than the original wild strain. Further research should emphasis on structural confirmations and phenotypic consequences of these variations. Moreover, the identification of the structural conformational changes helps to elucidate the virulence, metabolic pathway, pathogenicity, and transmission of SARS-CoV-2.

DATA AVAILABILITY
All datasets generated or analyzed during this study are included in the manuscript.

ethiCs stAteMeNt
Not applicable.