A Global Pandemic: Which Countries Are Stopping the Spread?
Living in the midst of a global pandemic, I found it important to look at how the virus is affecting different countries around the world.
Some research questions this project focuses on:
Are neighboring countries affected differently by the virus (United States, Canada, and Mexico)? Why might this be?
What does the overall world COVID-19 situation look like? Which countries have been most affected?
What factors are important in predicting/understanding the spread of the virus?
** It is important to note that the values utilized in this study are only for reference. Some countries may not be reporting completely accurate information.
I obtained my data for this project from two different sources.
The first dataset, obtained from Our World in Data, which I will refer to as the OWID dataset, has 40 variables but some information overlaps. For example, the variables representing new case and new cases smoothed overlap with the total cases variable.
The second dataset was obtained from the COVID19 R package, including 35 variables.
Both datasets included similar variables, so data often overlapped, leaving me with a narrowed down number of variables I used for my analysis.
Additionally, both datasets had many missing values; we will look more into this in the data exploration tab.
We will be looking at data up to 11-29-2020 for both datasets.
Some important variables from the OWID dataset are:
Location
Date
Total Cases
Total Deaths
Total Tests
Population
Population Density
GDP Per Capita
Life Expectancy
Stringency Index
Human Development Index
Some variables of interest from the R dataset that are distinct from the first dataset include:
Testing Policy
Gatherings Restrictions
Internal Movement Restrictions
International Movement Restrictions
Stay Home Restrictions
Extreme Poverty
These correlation plots show the correlations between some notable quantitative variables in the OWID dataset.
Conclusions from this plot:
There is a relatively strong positive correlation between GDP per capita and total tests per thousand.
Human development Index (HDI) takes into account the life expectancy, education, and GNI per capita, so the high correlation between HDI and life expectancy is easily explained.
Extreme poverty has a fairly strong negative correlation with HDI and life expectancy. GDP per capita has a strong positive correlation with HDI and life expectancy.
It is possible that economic factors play a role in whether or not a country has the ability to provide sufficient testing to its population.
I was interested to find that both population and population density had relatively low correlations with both total cases per million and total tests per thousand.
I was also interested to see that stringency index had a low correlation with total cases per million. The stringency index is a value 0-100 representing the strictness of a country’s government response to COVID-19, taking into account many factors.
Stringency Index
These plots show the percentage of missing values for each variable in the OWID dataset and the COVID19 R dataset.
You can see that variables like hospital patients and ICU patients have a large percentage of missing values- these variables are not usable.
Due to the nature of COVID-19 data, it is not surprising that the datasets have so many missing values. Data for certain countries is harder to come by due to a lack of reporting/availability to the public.
The US has consistently shown a greater number of reported cases per million than both Canada and Mexico throughout the pandemic. However, the United States has also been able to provide the largest quantity of tests per thousand.
On the other hand, Mexico has reported a lower number of cases per million; Mexico has provided less tests per thousand than its two neighbors, but its positive test rate is much higher than both the US and Canada. This is due to Mexico’s testing policy; they are only testing people who both have symptoms and meet certain criteria, while Canada and the U.S. now have open public testing.
Testing Policies:
0- No testing policy
1- Testing of people who both have symptoms and meet specific requirements (ex: came from overseas, in contact with affected person)
2- Testing of anyone with symptoms
3- Open public testing
As you can see, Finland and Norway have done a good job at keeping COVID at bay.
I was surprised to see that the stringency indices for both Finland and Norway have also been lower than their fellow European countries. This leads me to believe that rather than implementing restrictions as a preventative measure, these countries are implementing restrictions in response to elevated case numbers.
All four countries have open public testing.
---
title: "COVID-19 Dashboard"
author: "Julia Weber"
output:
flexdashboard::flex_dashboard:
theme: cosmo
orientation: columns
social: ["linkedin"]
source_code: embed
---
```{r setup, include=FALSE}
# load necessary packages
library(plyr)
library(tidyverse)
library(DataExplorer)
library(plotly)
library(COVID19)
library(flexdashboard) ## you need this package to create dashboard
# read the data set here
df <- read.csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
df2 <- covid19()
```
Introduction
=======================================================================
Column {data-width=600}
-----------------------------------------------------------------------
### Motivation
**A Global Pandemic: Which Countries Are Stopping the Spread?**
Living in the midst of a global pandemic, I found it important to look at how the virus is affecting different countries around the world.
Some research questions this project focuses on:
1. Are neighboring countries affected differently by the virus (United States, Canada, and Mexico)? Why might this be?
2. What does the overall world COVID-19 situation look like? Which countries have been most affected?
3. What factors are important in predicting/understanding the spread of the virus?
** It is important to note that the values utilized in this study are only for reference. Some countries may not be reporting completely accurate information.
Column {.tabset data-width=400}
-----------------------------------------------------------------------
### Data Sources
I obtained my data for this project from two different sources.
The first dataset, obtained from Our World in Data, which I will refer to as the OWID dataset, has 40 variables but some information overlaps. For example, the variables representing new case and new cases smoothed overlap with the total cases variable.
The second dataset was obtained from the COVID19 R package, including 35 variables.
Both datasets included similar variables, so data often overlapped, leaving me with a narrowed down number of variables I used for my analysis.
Additionally, both datasets had many missing values; we will look more into this in the data exploration tab.
We will be looking at data up to 11-29-2020 for both datasets.
### Variables in OWID Dataset
Some important variables from the OWID dataset are:
- Location
- Date
- Total Cases
- Total Deaths
- Total Tests
- Population
- Population Density
- GDP Per Capita
- Life Expectancy
- Stringency Index
- Human Development Index
### Variables in R Dataset
Some variables of interest from the R dataset that are distinct from the first dataset include:
- Testing Policy
- Gatherings Restrictions
- Internal Movement Restrictions
- International Movement Restrictions
- Stay Home Restrictions
- Extreme Poverty
Data Exploration
=======================================================================
Column {.tabset data-width=500}
-----------------------------------------------------------------------
### Correlation Plot
```{r}
latest_df <- df %>% filter(date == "2020-11-27")
f1 <- latest_df %>% select(c(total_tests_per_thousand, human_development_index, extreme_poverty, gdp_per_capita, life_expectancy, total_cases_per_million))
plot_correlation(f1, cor_args = list("use" = "complete.obs"))
f2 <- latest_df %>%
select(c(total_tests_per_thousand, total_cases_per_million,
population, population_density, positive_rate,
gdp_per_capita, stringency_index))
plot_correlation(f2,
cor_args = list("use" = "complete.obs"))
```
### Scatterplots
```{r}
plot_scatterplot(f1 %>% drop_na(), by = "total_tests_per_thousand", ncol=2)
plot_scatterplot(f1 %>% drop_na %>% select(-c("total_tests_per_thousand")),
by = "life_expectancy", ncol=2)
plot_scatterplot(f2 %>% drop_na(), by = "total_tests_per_thousand", ncol=2)
plot_scatterplot(f2 %>% drop_na %>% select(-c("total_tests_per_thousand")),
by = "stringency_index", ncol=2)
```
### Missing Values
```{r}
plot_missing(df)
plot_missing(df2)
```
Column {.tabset data-width=500}
-----------------------------------------------------------------------
### Correlation Plot Explanations
These correlation plots show the correlations between some notable quantitative variables in the OWID dataset.
Conclusions from this plot:
- There is a relatively strong positive correlation between GDP per capita and total tests per thousand.
- Human development Index (HDI) takes into account the life expectancy, education, and GNI per capita, so the high correlation between HDI and life expectancy is easily explained.
- Extreme poverty has a fairly strong negative correlation with HDI and life expectancy. GDP per capita has a strong positive correlation with HDI and life expectancy.
- It is possible that economic factors play a role in whether or not a country has the ability to provide sufficient testing to its population.
### Lack of Correlation
- I was interested to find that both population and population density had relatively low correlations with both total cases per million and total tests per thousand.
- I was also interested to see that stringency index had a low correlation with total cases per million. The stringency index is a value 0-100 representing the strictness of a country's government response to COVID-19, taking into account many factors.
```{r , echo=FALSE, fig.cap="Stringency Index", fig.align="center", out.width = '75%'}
knitr::include_graphics("H:/My Drive/UD Student Research/Capstone Weber/final-capstone/stringencyIndex.png")
```
### Missing Values Plot Explanations
These plots show the percentage of missing values for each variable in the OWID dataset and the COVID19 R dataset.
- You can see that variables like hospital patients and ICU patients have a large percentage of missing values- these variables are not usable.
- Due to the nature of COVID-19 data, it is not surprising that the datasets have so many missing values. Data for certain countries is harder to come by due to a lack of reporting/availability to the public.
USA/CAN/MEX
=======================================================================
Column {.tabset data-width=650}
-----------------------------------------------------------------------
### New Cases Per Million
### Total Tests Per Thousand
### Positive Rates by Country
### Testing Policy
Column {data-width=650}
-----------------------------------------------------------------------
### Explanation
The US has consistently shown a greater number of reported cases per million than both Canada and Mexico throughout the pandemic. However, the United States has also been able to provide the largest quantity of tests per thousand.
On the other hand, Mexico has reported a lower number of cases per million; Mexico has provided less tests per thousand than its two neighbors, but its positive test rate is much higher than both the US and Canada. This is due to Mexico's testing policy; they are only testing people who both have symptoms and meet certain criteria, while Canada and the U.S. now have open public testing.
Testing Policies:
0- No testing policy
1- Testing of people who both have symptoms and meet specific requirements (ex: came from overseas, in contact with affected person)
2- Testing of anyone with symptoms
3- Open public testing
Europe
=======================================================================
Column {.tabset data-width=650}
-----------------------------------------------------------------------
### New Cases Per Million
### Total Tests Per Thousand
### Stringency Indices
Column {data-width=650}
-----------------------------------------------------------------------
### Explanations
- As you can see, Finland and Norway have done a good job at keeping COVID at bay.
- I was surprised to see that the stringency indices for both Finland and Norway have also been lower than their fellow European countries. This leads me to believe that rather than implementing restrictions as a preventative measure, these countries are implementing restrictions in response to elevated case numbers.
- All four countries have open public testing.
Global Data Exploration
=======================================================================
Column {.tabset data-width=650}
-----------------------------------------------------------------------
### Total Cases Per Million
### Total Cases
About the Author
=======================================================================
Column {data-width=650}
-----------------------------------------------------------------------
### Personal Background
Hello, my name is Julia Weber and I'm an undergraduate student at the University of Dayton with an expected graduation date of May 2021. This dashboard is for my mathematics capstone project, under the supervision of Dr. Ying-Ju Tessa Chen.
DEGREES IN PROGRESS:
- B.S. in Applied Mathematical Economics with a Computer Science Minor
- B.S.B.A. in Finance with an Investment Management Concentration
I am interested in pursuing a career in data analytics/business analytics. I have experience coding in Python, R, and Java.
As far as professional work experience, I have worked as a Research and Development Intern at Tenet3, LLC from June 2020-Present. Throughout this internship, I applied my knowledge from computer science and mathematics courses by exploring datasets, creating visualizations and dashboards, and implementing metrics from research papers in Python.
Feel free to connect with me on LinkedIn!
Column {data-width=150}
-----------------------------------------------------------------------
### Photo
```{r , echo=FALSE, fig.cap="Julia Weber", out.width = '100%'}
knitr::include_graphics("H:/My Drive/UD Student Research/Capstone Weber/final-capstone/headshot.jpg")
```