Correlation Between Temperature and COVID-19 (Suspected, Confirmed and Death) Cases based on Machine Learning Analysis

Currently, the whole world is struggling with the biggest health problem COVID-19 name coined by the World Health Organization (WHO). This was raised from China in December 2019. This pandemic is going to change the world. Due to its communicable nature, it is contagious to both medically and economically. Though different contributing factors are not known yet. Herein, an effort has been made to find the correlation between temperature and different cases situation (suspected, confirmed, and death cases). For a said purpose, k-means clustering-based machine learning method has been employed on the data set from different regions of China, which has been obtained from the WHO. The novelty of this work is that we have included the temperature field in the original WHO data set and further explore the trends. The trends show the effect of temperature on each region in three different perspectives of COVID-19 – suspected, confirmed and death


INTRODUCTION
Study reveals that approximately dozens of viruses exist in the corona family, but humans are affected by its seven types 1 . Some are caused by mild colds in people, while remaining are deadly, and it is believe that they are to be transmitted from animals like bats 1,2 . The WHO has been noticed about these special pneumonic cases in the last week of December. The following week the cause of this special condition was found as COVID-2019. This causes an acute respiratory disease in humans, has emerged as the latest worldwide epidemic, having already claimed a considerable number of lives, especially in China.
Last year in December 2019, COVID-19 emerged as a pandemic in Wuhan, China and thousands of people got affected 3 .
In response to the sudden explosion of nCoV-2019 ( Fig. 1) 4 , the Research and Development wing of WHO is actively trying to find the appropriate diagnostics and vaccination 5 . Day by day, medical experts are working rigorously and trying to explore its severe consequence such as human body respiratory symptoms get affected, particularly in the elderly people imparting mortality rate 6 .
Different types of non-pharmaceutical interventions are taken and suggested to manage the COVID-19 effectively because there are no licensed authentic vaccines or coronavirus antivirals are found, one of the effective managements is seen i.e., lockdown 7 . The pessimistic view of this lockdowm is that it effects the whole world economy particularly transportation because the people are in quarantine state. Richard Baldwin, a professor of international economics at the Graduate Institute in Geneva said, "This virus is as economically contagious as it is medically contagious," 8 . It is mandatory to establish a health clinic with an Artificial Intelligence (AI) based trained system, in order to fight/prevent quickly these natural epidemics 9 .
It is quite difficult to estimate the fatality ratio with COVID-19. As of 07 March 2020, there were approximately 80,813 confirmed cases of COVID-19 10 are particularly seen in china. Its subsequent spread worldwide and has been challenging the global public health community to confront a novel infectious disease (corona virus disease 2019, COVID-19). The rapid and accurate detection of corona virus is therefore becoming increasingly important. Machine learning applications have been widely used in medical sciences for various purposes such as disease diagnosis, prognosis, and different kinds of analytics including death rates by a disease. Deep learning method is applied on COVID-19 data set, detection accuracy rate is 86.7% 6 .
We aim to explore the correlation between temperature and different cases situation  [4] with permission under the terms of Creative Commons Attribution 4.0 International License (suspected, confirmed, and death cases). For a said purpose k-means machine learning method has been employed on the data set from different regions of China, which has been obtained from the WHO. We hope our study findings will inform the global community of the emergence of this novel COVID-19 and its impact on economy.

MeTHODOLOGy
This section presents our applied methodology for the analysis of COVID-19 data set, taken from the World Health Organization (WHO) data set 11 . The data analysis has been performed using WEKA machine learning tool 12 to obtain the different trends. The methodology comprises these steps: data set collection, database design and description, clustering for attaining the trends.

Data Set Collection
Data set is a prerequisite for data analysis. In the present research work, we have used the data set of 'Coronavirus disease (COVID-2019) situation reports', which has been collected from the World Health Organization (WHO) 11 . This data set covers the corona infection rate in different regions of China with respect to temperature.

Data Set Design and Description
The WHO data set, has all the attributes as shown in Table 1 except two attributeslowest temperature and highest temperature which have added separately. The reason of adding these two attributes because temperature is one of the factors of spreading the coronavirus. For this, we added two additional attributes in the WHO data set, highest and lowest temperature for each city/region given in the WHO data set, which have been recorded from the AccuWeather website. After adding temperature fields, the results could be more promising for exploring the trends/patterns. The design the of the data set is presented in Table 1 with explanation of each attribute and corresponding values of data type and constraint. A sample record from the data set is shown in Table 2.

Applied Classifier-Clustering
Clustering is a well-known unsupervised machine learning classifier, and it is significantly used for pattern discovery procedure from the dispersed data set. A good clustering method produces high-quality clusters to ensure that the similarity among inter-cluster is less, and high in intra-cluster. In other words, members of a cluster are more like to each other than they are like members of different clusters. A cluster is a collection of data objects that are similar in some sense to one another and identifies clusters in a set of data and builds a typology of sets using a certain set of data. In the present research analysis, the clustering technique is applied on the COVID-19 data set. It is useful here because there are many cases and no obvious natural grouping is found. Here, the clustering machine learning algorithm is useful to find whatever grouping may exist.

K-means Clustering for COVID-19
Clustering can be done by various algorithms such as k-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Hierarchical Clustering.
In the present investigation, k-means clustering has been applied because it is a distance based, fast processing, and has a linear complexity O(n) 13 . The concept of k-means clustering algorithm is given by [14]. The steps are as follows: Select the number of 'k' clusters to be identified into the space represented by the objects that are being clustered, which are represented in the initial group of centroids.
Each data point is categorized by calculating the distance between that point to each group centroid, and then classify the point closest center to it.
Recompute the group centroid based on classified points.
Repeat Steps 2 and 3 until the centroids does not change.

ReSULTS AND DISCUSSION
The objective of the experiment is to analyze the effect of temperature on the covid-19 cases including suspected, confirmed, and death  Tables 4 to 7 and they are graphically represented in Fig. 2 to 4, where X-axis represents the cluster results of temperature with death, confirmed, and suspected and Y-axis represents the different regions.
Table{ 4, 5, 7} ⊂ Table 3 …… (1) From the k-means clustering technique 34 clusters are found as shown in Table 3. Three possible patterns are revealed, as discussed in subsequent subsections.

Trend 1: effect of temperature on Death cases
Filtered cluster results manipulated in Table 5, same trend has been detected as we have seen in Table 4 Fig. 3, the trend seems there is no as such direct relationship between the temperature and confirmed Covid-19 cases. Some exception cases such as when the cluster value for Hubei temperature reaches to ≥ 14°C then the number confirmed cases escalates higher and reaches up to 189.95. However, Table 5 shows the maximum temperature is 21.71°C for region Guangdong but have confirmed cases are around 0.42. It is also noted from Table 5 that Guangdong is the highly populous region, which means spread the COVID-19 pandemic is controlled here by certain  From the analysis of clustering results shown in Table 4, we see the trends that Hubei has the highest number of death cases i.e., 33.14%, in Fig. 2, we can see a clear jump in Hubei. After that, three regions, Hainan, Henan, and Inner Mongolia have the same death rate i.e., 0.14% at different temperatures. Out of four regions, three have death cases when the recorded temperature ≥ 14°C. Hainan has the highest jump in temperature, but the death rate is same as with Henan and Inner Mongolia. Interestingly, Inner Mongolia has a death case at less temperature 4.85°C compare to others. precautionary measures.

Trend 3: effect of temperature on Suspected cases
The clustered suspected cases are presented in Table 7, for region Hubei the common trend has been seen i.e., compare to other cities. As depicted in Figure 4, this city has the highest number suspected cases. Another trend is that Hubei, afflicted at the same temperature of 14°C, which was clearly seen in the trends of death and confirmed cases (refer to Table 4 and 5). Another trend has been explored that region Hainan is common with the clusters of death as shown in Table 4 with the temperature of 26.71°C. The similarity in these trends is that these two cities/ regions are affected at the same temperature. Here, the analytical results, stress us to argument that the suspected cases of Hainan city comes directly into the death cases, without going to confirmed phase as evidence provided in Table 6.

CONCLUSIONS
In summary, from the data analysis, we found that temperature is not only the significant factor for the spread of COVID-19 pandemic. While exploring the effect of temperature in suspected, confirmed, death cases; we have seen the diverse nature of trends for each city except Hubei. We can say that other attributes are also playing the role in the COVID-19. Till now, no potential precaution has been seen except the lock-down, which has been seen in China that slowed its spread. We need to focus on developing the AI based system in the hospitals that can assist the medical doctors to monitor the communicable like COVID-19 disease patterns, serving as part of the early detection that alert the world to potential outbreaks. The future work in this area may investigate (1) analytics of taking the whole world  COVID-19 data sets, (2) taking more attributes in the primary data set such as age and type of intervention for better knowledge discovery.