The Perspectives of Individuals with Comorbidities Towards COVID-19 Booster Vaccine Shots in Twitter: A Social Media Analysis Using Natural Language Processing, Sentiment Analysis and Topic Modeling

Individuals with comorbidities (i.e., Diabetes Mellitus, hypertension, heart diseases) are more likely to develop a more severe form of coronavirus disease 2019 (COVID-19), thus, they should take necessary precautions to avoid infection with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and its emerging variants and subvariants by getting COVID-19 vaccination and booster doses. In this regard, we used text analytics techniques, specifically Natural Language Processing (NLP), to understand the perception of Twitter users having comorbidities (diabetes, hypertension, and heart diseases) towards the COVID-19 vaccine booster doses. Understanding and identifying Twitter users’ perceptions and perspectives will help the members of medical fraternities, governments, and policymakers to frame and implement a suitable public health policy for promoting the uptake of booster shots by such vulnerable people. A total of 176,540 tweets were identified through the scrapping process to understand the perception of individuals with the mentioned comorbidities regarding the COVID-19 booster dose. From sentiment analysis, it was revealed that 57.6% out of 176,540 tweets expressed negative sentiments about the COVID-19 vaccine booster doses. The reasons for negative expressions have been found using the topic modeling approach (i.e., risk factors, fear of myocardial fibrosis, stroke, or death, and using vaccines as bio-weapons). Of note, enhancing the COVID-19 vaccination drive by administering its booster doses to more and more people is of paramount importance for rendering higher protective immunity under the current threats of recently emerging newer Omicron subvariants which are presently causing a rise in cases in a few countries, such as China and others, and might lead to a feasible new wave of the pandemic with the surge in cases at the global level.


INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel coronavirus that first appeared in December 2019 in Wuhan, China, causing coronavirus disease-2019 (COVID- 19) which was discovered in a cluster of pneumonia cases of unknown origin. 1,24][5][6] As of January 2023, COVID-19 has resulted in more than 670 million confirmed cases with over 6.8 million deaths. 3Furthermore, the World Health Organization (WHO) has classified the recent SARS-CoV-2 variants into three categories to monitor and assess the evolution of SARS-CoV-2: variants under monitoring (VUMs), variants of interest (VOIs), and variants of concern (VOCs).Up to October 2021, Alpha (B.1.1.7),Beta (B.1.351),Gamma (P.1), and Delta (B.1.617.2) were identified as VOCs. 3 Furthermore, as of November 26, 2021, the Omicron variant (B.1.1.529),which led to a global surge in the number of COVID-19 cases, was classified as the fifth VOC by WHO. 7 Presently, the Omicron variant and its descendent lineages, including BA.1, BA.2, BA.3, BA.4, and BA.5, are designated as the only circulation VOC, while Alpha, Beta, Gamma, and Delta have been designated as previously circulating VOCs. 8he genomic surveillance of the different strains of SARS-CoV-2 has revealed that its fast-mutating nature has played a significant role in the rapid spread of its different strains worldwide. 9More recently, newer Omicron subvariants and sub-lineages viz., BQ.1, BQ.1.1,1][12][13][14][15][16] To tackle the spread of COVID-19, vaccine development research was initiated and carried out by various organizations and researchers globally in a fast track and emergency mode that saved the world from this devastating ongoing pandemic past three years. 17he first COVID-19 vaccine was administered outside clinical settings on December 8, 2020, by May Parsons at University Hospital Coventry.By October 5, 2022, about 12.7 billion doses of the COVID-19 vaccines were administrated 18 , and more than 5.35 billion people had received at least one dose of the vaccines. 19s per World Health Organization, currently, as of January 2023, a total of 13,156,047,747 vaccine doses have been administered.Based on a recent study conducted by The Lancet, the vaccines averted deaths to two-thirds in their first year and reportedly saved an estimated 19.8 million lives. 20However, the global vaccination drive was impacted by vaccine inequity (in terms of availability, accessibility, and affordability due to socio-economic reasons) and vaccine hesitancy (reluctance to get inoculated). 21,22he initial two doses of COVID-19 vaccines can immunize people against severe COVID-19 cases and death.However, the immunity tends to wane after some time, necessitating booster shots to be administered to sustain the immunity. 230][31][32][33] Furthermore, when older patients with comorbidities, especially those with 65-year-old patients and above, get infected, they are more likely to have a higher intensive care unit (ICU) admission rate and, subsequently, an increased mortality rate. 34herefore, this category of patients should follow the necessary measures and precautions to avoid SARS-CoV-2 and variant infection because they have the worst prognosis in case of incident infection.Thus, we aimed in this study to understand and determine the perception and perspectives about COVID-19 vaccine booster doses according to the Twitter users with comorbidities (heart diseases, diabetics, and hypertension) and glean insights on their understanding and attitude towards the COVID-19 vaccine booster doses and the associated subsequent vaccination.
Understanding and identifying Twitter users' perceptions and perspectives will help the members of medical fraternities, governments, and policymakers not only to frame and implement a suitable public health policy and promote the COVID-19 vaccine drive with booster shots but also to predict, control, and subsequently prevent any health crisis or infectious disease outbreak. 35,36

Research Methodology
In this study, the perception of individuals with comorbidities all over the world about the COVID-19 vaccine booster doses was analyzed through social media posts on the Twitter platform (Twitter, Inc., San Francisco, California, USA) using sentiment analysis and topic modeling methods.The Twitter platform posts were collected from Twitter for analysis since it has one of the highest numbers of users among different social media platforms.In this regard, we have used the keywords to collect social media posts of Twitter users who have three comorbidities (heart diseases, diabetes, and hypertension) and collected views about the COVID-19 booster doses.Tweets containing one of the following phrases 'I'm a heart disease patient,' 'I have heart disease,' 'I'm having high BP,' 'I'm having hypertension,' 'I'm having diabetes,' and 'I have diabetes' with a words 'covid booster dose' were scrapped for this study.The entire research methodology is mentioned in the Figure 1.

Data Collection
8][39][40] In this study, we have used the Python library Twint to scrape the necessary tweets.2][43] A total of 176,540 English tweets were collected and used for this study.

Data Cleaning
The tweet corpus thus collected cannot be directly used.They must be pre-processed and cleaned.In the data pre-processing stage, stop words (words like 'a', 'an', 'if' etc., which do not contribute much to the meaning of the text), punctuation, and hyperlinks are removed. 44urther, the tweets are tokenized, stemmed, and lemmatized.Stemming is the process of removing suffixes from words.For example, the word "victims" becomes "victim,"' and "affecting"/" affected" get reduced to "affect."Further, lemmatization (stemming done with proper contextual understanding) is performed.An example of lemmatization is taking 'sing,' 'singing,' and 'sang' as sing.

Sentiment Analysis
Sentiment analysis is the automatic process of identifying and attributing sentiments to any textual entity.In the context of our work, sentiment analysis is performed on each tweet in the corpus.Understanding the sentiments and emotions of people's opinions helps frame public policy and check the public perception of government schemes and their implementation. 45n our study, we've used the concept of sentiment analysis to study how the general public, especially people with diabetes, hypertension, and heart diseases, view the COVID-19 booster dose vaccine.Textblob is the Python library used for sentiment analysis.The Textblob library is based on computational linguistics, where each word in the Oxford dictionary is attributed a sentiment score in the range [-1,1].If the score lies in the range [-1,-0.5], the word is classified as 'negative', if the score lies in the range [-0.5, 0.5], the word is classified as neutral, and if the score lies between 0.5 and 1 (both inclusive), the word is classified as positive.Based on the average of the scores of words in a document, the entire document will be classified as positive, negative, or neutral. 44

Topic Modeling
The study then moves on to topic modeling, which is performed using Latent Dirichlet Analysis. 38It is a popular method to classify text into different topics by assigning the words to different topics, computing the probability for instances of the word within the topic, and the proportion of each topic in the document (the text being analyzed).LDA is a bagof-words model and is used to identify hidden topics in our collection of tweets, which is a highly unstructured textual data source. 37The probability distribution of the topics is built on the probability distribution of the words and is hence termed the Dirichlet process, where the term implies the presence of a function built over another function that takes us a range of values.
LDA identifies latent (hidden) topics in the tweet corpus and is controlled by hyperparameters alpha (α) and Eta(η).If the value of α is too high, a collection of topics of varying kinds of probability will appear.Hence instead, we set a low value for α for the model to internally rank and choose topics based on each having a higher probability than the other.The model was run many times with different parameters to achieve the desired results.
Certain topic modeling methods that were used previously for text analytics were manual content analysis and word frequency methods.The biggest drawback of manual content analysis was the fact that it was time-consuming and relied heavily on the expertise of the person reading through it.The word frequency method involves counting the number of times each word appeared in the corpus.Since this only gave the word-wise count without any usage context, the results were deemed ambiguous. 44ver the years, the Latent Dirichlet Allocation method gained traction and emerged to be the modeling method preferred generally and widely used in research, especially for an unstructured corpus. 44Thus, in this study, we used LDA to perform sentimental analysis on the huge tweet corpus generated by users with comorbidities tweeting their opinion on the COVID-19 vaccine.Since the negative sentiments about booster doses among Twitter users with diabetes are high, it must be further analyzed to get the reason behind the same.Thus, topic modeling has been applied to the same data (only on negative sentiments) and found relevant and interesting topics.The topics are listed in (Table 2).Latent Dirichlet Allocation modeling is the type of topic modeling used in this study.It works based on the Bayesian principle, which clusters the topics found on similarity.The topics found through the analysis describe the problems associated with booster doses.Topic modeling results are mentioned in the Table 2.

DISCUSSION AND CONCLUSION
To the best of our knowledge, this is the first study of its type to assess and identify the perception and perspectives of Twitter users with the three main comorbidities (diabetes, hypertension, and heart diseases) during the ongoing pandemic times about COVID-19 vaccine booster dose.Individuals with these three commodities were only included in this study, and 176,540 tweets have been taken from Twitter users all over the world suffering from the above three mentioned conditions.Since they are the most vulnerable subjects to COVID-19, it would be very informative to understand and portray their views on COVID-19 booster dose adoption and its protective effects.With this motive, we gathered the opinions of people with comorbidities on booster doses from social media.The results showed that negative sentiments about booster dose adoption were very high.We have used five months of data for our study (September 2022 to January 2023), and our results show that compared to the early days of our data frame, later months, December 2022 and January 2023, show The reason behind the negative perception of the booster dose was exhibited through the topic modeling technique.The topic modeling results showed that the people who faced problems/sickness after taking the booster dose made others think negatively.Other than the health risks, some people believe that the government uses them like lab rats or experimentation elements.Lastly, mandatory vaccine policy in some countries made them feel pessimistic about the vaccine.
The awareness can be created by exploring the studies on each topic identified from the topic modeling output.This study paves the path to identify the gaps where the government should improve its promotion strategy/awareness of COVID-19 vaccine booster doses.Also, the study explored the pain points of people with diabetes toward booster doses when imposed forcefully on them.The limitation of the research includes the usage of only English tweets for the study.Future research can focus on analyzing tweets in other languages to understand if cultural aspects play a role in the perception of people with comorbidities about booster doses.Notably, it is very high time to promote COVID-19 vaccination campaigns and booster doses for providing adequate protective immunity under the threats of recently emerging newer Omicron subvariants, rise in cases being seen again in few countries and amidst the fears growing of feasible drive of a new wave of the COVID-19 pandemic.
Future research can focus on the geographical and sociodemographic base of Twitter users with the three mentioned comorbidities to provide more insights into their views about COVID-19 booster dose adoption and its protective effects.Moreover, our study was limited to the three main comorbidities (diabetes, hypertension, and heart diseases); therefore, future studies on other COVID-19 booster dose-associated comorbidities should be considered through broader search filters.Finally, due to the crosssectional nature of our study, future longitudinal data across different time periods should be considered to provide more insightful views about the COVID-19 booster dose according to the public perspectives and responses on Twitter over time and the subsequent implications.To conclude, it is very high time to promote COVID-19 vaccination campaigns and booster doses for providing adequate protective immunity under the threats of recently emerging newer Omicron subvariants, the rise in cases being seen again in few countries, and amidst the growing fears of a feasible drive of a new wave of the COVID-19 pandemic.

Figure 1 .
Figure 1.Data Collection and Data Pre-processing

Table 2 .
Topic labels and words found through Topic modeling