Introduction

Column

Motivation

A Global Pandemic: Which Countries Are Stopping the Spread?

Living in the midst of a global pandemic, I found it important to look at how the virus is affecting different countries around the world.

Some research questions this project focuses on:

  1. Are neighboring countries affected differently by the virus (United States, Canada, and Mexico)? Why might this be?

  2. What does the overall world COVID-19 situation look like? Which countries have been most affected?

  3. What factors are important in predicting/understanding the spread of the virus?

** It is important to note that the values utilized in this study are only for reference. Some countries may not be reporting completely accurate information.

Column

Data Sources

I obtained my data for this project from two different sources.

The first dataset, obtained from Our World in Data, which I will refer to as the OWID dataset, has 40 variables but some information overlaps. For example, the variables representing new case and new cases smoothed overlap with the total cases variable.

The second dataset was obtained from the COVID19 R package, including 35 variables.

Both datasets included similar variables, so data often overlapped, leaving me with a narrowed down number of variables I used for my analysis.

Additionally, both datasets had many missing values; we will look more into this in the data exploration tab.

We will be looking at data up to 11-29-2020 for both datasets.

Variables in OWID Dataset

Some important variables from the OWID dataset are:

  • Location

  • Date

  • Total Cases

  • Total Deaths

  • Total Tests

  • Population

  • Population Density

  • GDP Per Capita

  • Life Expectancy

  • Stringency Index

  • Human Development Index

Variables in R Dataset

Some variables of interest from the R dataset that are distinct from the first dataset include:

  • Testing Policy

  • Gatherings Restrictions

  • Internal Movement Restrictions

  • International Movement Restrictions

  • Stay Home Restrictions

  • Extreme Poverty

Data Exploration

Column

Correlation Plot

Scatterplots

Missing Values

Column

Correlation Plot Explanations

These correlation plots show the correlations between some notable quantitative variables in the OWID dataset.

Conclusions from this plot:

  • There is a relatively strong positive correlation between GDP per capita and total tests per thousand.

  • Human development Index (HDI) takes into account the life expectancy, education, and GNI per capita, so the high correlation between HDI and life expectancy is easily explained.

  • Extreme poverty has a fairly strong negative correlation with HDI and life expectancy. GDP per capita has a strong positive correlation with HDI and life expectancy.

  • It is possible that economic factors play a role in whether or not a country has the ability to provide sufficient testing to its population.

Lack of Correlation

  • I was interested to find that both population and population density had relatively low correlations with both total cases per million and total tests per thousand.

  • I was also interested to see that stringency index had a low correlation with total cases per million. The stringency index is a value 0-100 representing the strictness of a country’s government response to COVID-19, taking into account many factors.

Stringency Index

Stringency Index

Missing Values Plot Explanations

These plots show the percentage of missing values for each variable in the OWID dataset and the COVID19 R dataset.

  • You can see that variables like hospital patients and ICU patients have a large percentage of missing values- these variables are not usable.

  • Due to the nature of COVID-19 data, it is not surprising that the datasets have so many missing values. Data for certain countries is harder to come by due to a lack of reporting/availability to the public.

USA/CAN/MEX

Column

New Cases Per Million

Total Tests Per Thousand

Positive Rates by Country

Testing Policy

Column

Explanation

The US has consistently shown a greater number of reported cases per million than both Canada and Mexico throughout the pandemic. However, the United States has also been able to provide the largest quantity of tests per thousand.

On the other hand, Mexico has reported a lower number of cases per million; Mexico has provided less tests per thousand than its two neighbors, but its positive test rate is much higher than both the US and Canada. This is due to Mexico’s testing policy; they are only testing people who both have symptoms and meet certain criteria, while Canada and the U.S. now have open public testing.

Testing Policies:

0- No testing policy

1- Testing of people who both have symptoms and meet specific requirements (ex: came from overseas, in contact with affected person)

2- Testing of anyone with symptoms

3- Open public testing

Europe

Column

New Cases Per Million

Total Tests Per Thousand

Stringency Indices

Column

Explanations

  • As you can see, Finland and Norway have done a good job at keeping COVID at bay.

  • I was surprised to see that the stringency indices for both Finland and Norway have also been lower than their fellow European countries. This leads me to believe that rather than implementing restrictions as a preventative measure, these countries are implementing restrictions in response to elevated case numbers.

  • All four countries have open public testing.

Global Data Exploration

Column

Total Cases Per Million

Total Cases

About the Author

Column

Personal Background

Hello, my name is Julia Weber and I’m an undergraduate student at the University of Dayton with an expected graduation date of May 2021. This dashboard is for my mathematics capstone project, under the supervision of Dr. Ying-Ju Tessa Chen.

DEGREES IN PROGRESS:

  • B.S. in Applied Mathematical Economics with a Computer Science Minor

  • B.S.B.A. in Finance with an Investment Management Concentration

I am interested in pursuing a career in data analytics/business analytics. I have experience coding in Python, R, and Java.

As far as professional work experience, I have worked as a Research and Development Intern at Tenet3, LLC from June 2020-Present. Throughout this internship, I applied my knowledge from computer science and mathematics courses by exploring datasets, creating visualizations and dashboards, and implementing metrics from research papers in Python.

Feel free to connect with me on LinkedIn!

Column

Photo

Julia Weber

Julia Weber

---
title: "COVID-19 Dashboard"
author: "Julia Weber"
output: 
  flexdashboard::flex_dashboard:
    theme: cosmo
    orientation: columns
    social: ["linkedin"]
    source_code: embed
---


```{r setup, include=FALSE}
# load necessary packages
library(plyr)
library(tidyverse)
library(DataExplorer)
library(plotly)
library(COVID19)
library(flexdashboard)  ## you need this package to create dashboard

# read the data set here
df <- read.csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
df2 <- covid19()
```

Introduction
=======================================================================

Column {data-width=600}
-----------------------------------------------------------------------

### Motivation


**A Global Pandemic: Which Countries Are Stopping the Spread?**



Living in the midst of a global pandemic, I found it important to look at how the virus is affecting different countries around the world.

Some research questions this project focuses on:

1. Are neighboring countries affected differently by the virus (United States, Canada, and Mexico)? Why might this be?

2. What does the overall world COVID-19 situation look like?  Which countries have been most affected?

3. What factors are important in predicting/understanding the spread of the virus?

** It is important to note that the values utilized in this study are only for reference.  Some countries may not be reporting completely accurate information.



Column {.tabset data-width=400} 
-----------------------------------------------------------------------

### Data Sources


I obtained my data for this project from two different sources.

The first dataset, obtained from Our World in Data, which I will refer to as the OWID dataset, has 40 variables but some information overlaps.  For example, the variables representing new case and new cases smoothed overlap with the total cases variable.

The second dataset was obtained from the COVID19 R package, including 35 variables.

Both datasets included similar variables, so data often overlapped, leaving me with a narrowed down number of variables I used for my analysis.

Additionally, both datasets had many missing values; we will look more into this in the data exploration tab.

We will be looking at data up to 11-29-2020 for both datasets.


### Variables in OWID Dataset

Some important variables from the OWID dataset are:

- Location

- Date

- Total Cases

- Total Deaths

- Total Tests

- Population

- Population Density

- GDP Per Capita

- Life Expectancy

- Stringency Index

- Human Development Index

### Variables in R Dataset

Some variables of interest from the R dataset that are distinct from the first dataset include:

- Testing Policy

- Gatherings Restrictions

- Internal Movement Restrictions

- International Movement Restrictions

- Stay Home Restrictions

- Extreme Poverty


Data Exploration
=======================================================================

Column {.tabset data-width=500}
-----------------------------------------------------------------------

### Correlation Plot

```{r}
latest_df <- df %>% filter(date == "2020-11-27")

f1 <- latest_df %>% select(c(total_tests_per_thousand, human_development_index, extreme_poverty, gdp_per_capita, life_expectancy, total_cases_per_million))
plot_correlation(f1, cor_args = list("use" = "complete.obs"))

f2 <- latest_df %>% 
                   select(c(total_tests_per_thousand, total_cases_per_million, 
                            population, population_density, positive_rate, 
                            gdp_per_capita, stringency_index))
plot_correlation(f2, 
                 cor_args = list("use" = "complete.obs"))

```

### Scatterplots

```{r}
plot_scatterplot(f1 %>% drop_na(), by = "total_tests_per_thousand", ncol=2)
plot_scatterplot(f1 %>% drop_na %>% select(-c("total_tests_per_thousand")), 
                 by = "life_expectancy", ncol=2)
plot_scatterplot(f2 %>% drop_na(), by = "total_tests_per_thousand", ncol=2)
plot_scatterplot(f2 %>% drop_na %>% select(-c("total_tests_per_thousand")), 
                 by = "stringency_index", ncol=2)

```

### Missing Values
```{r}
plot_missing(df)

plot_missing(df2)
```

Column {.tabset data-width=500}
-----------------------------------------------------------------------
### Correlation Plot Explanations

These correlation plots show the correlations between some notable quantitative variables in the OWID dataset.

Conclusions from this plot:

- There is a relatively strong positive correlation between GDP per capita and total tests per thousand.

- Human development Index (HDI) takes into account the life expectancy, education, and GNI per capita, so the high correlation between HDI and life expectancy is easily explained.

- Extreme poverty has a fairly strong negative correlation with HDI and life expectancy.  GDP per capita has a strong positive correlation with HDI and life expectancy.

- It is possible that economic factors play a role in whether or not a country has the ability to provide sufficient testing to its population.


### Lack of Correlation



- I was interested to find that both population and population density had relatively low correlations with both total cases per million and total tests per thousand.

- I was also interested to see that stringency index had a low correlation with total cases per million. The stringency index is a value 0-100 representing the strictness of a country's government response to COVID-19, taking into account many factors.


```{r , echo=FALSE, fig.cap="Stringency Index", fig.align="center", out.width = '75%'}
knitr::include_graphics("H:/My Drive/UD Student Research/Capstone Weber/final-capstone/stringencyIndex.png")
```




### Missing Values Plot Explanations

These plots show the percentage of missing values for each variable in the OWID dataset and the COVID19 R dataset.

- You can see that variables like hospital patients and ICU patients have a large percentage of missing values- these variables are not usable.

- Due to the nature of COVID-19 data, it is not surprising that the datasets have so many missing values.  Data for certain countries is harder to come by due to a lack of reporting/availability to the public.



USA/CAN/MEX
=======================================================================
Column {.tabset data-width=650}
-----------------------------------------------------------------------

### New Cases Per Million

### Total Tests Per Thousand

### Positive Rates by Country

### Testing Policy

Column {data-width=650} ----------------------------------------------------------------------- ### Explanation The US has consistently shown a greater number of reported cases per million than both Canada and Mexico throughout the pandemic. However, the United States has also been able to provide the largest quantity of tests per thousand. On the other hand, Mexico has reported a lower number of cases per million; Mexico has provided less tests per thousand than its two neighbors, but its positive test rate is much higher than both the US and Canada. This is due to Mexico's testing policy; they are only testing people who both have symptoms and meet certain criteria, while Canada and the U.S. now have open public testing. Testing Policies: 0- No testing policy 1- Testing of people who both have symptoms and meet specific requirements (ex: came from overseas, in contact with affected person) 2- Testing of anyone with symptoms 3- Open public testing Europe ======================================================================= Column {.tabset data-width=650} ----------------------------------------------------------------------- ### New Cases Per Million

### Total Tests Per Thousand

### Stringency Indices

Column {data-width=650} ----------------------------------------------------------------------- ### Explanations - As you can see, Finland and Norway have done a good job at keeping COVID at bay. - I was surprised to see that the stringency indices for both Finland and Norway have also been lower than their fellow European countries. This leads me to believe that rather than implementing restrictions as a preventative measure, these countries are implementing restrictions in response to elevated case numbers. - All four countries have open public testing. Global Data Exploration ======================================================================= Column {.tabset data-width=650} ----------------------------------------------------------------------- ### Total Cases Per Million

### Total Cases

About the Author ======================================================================= Column {data-width=650} ----------------------------------------------------------------------- ### Personal Background Hello, my name is Julia Weber and I'm an undergraduate student at the University of Dayton with an expected graduation date of May 2021. This dashboard is for my mathematics capstone project, under the supervision of Dr. Ying-Ju Tessa Chen. DEGREES IN PROGRESS: - B.S. in Applied Mathematical Economics with a Computer Science Minor - B.S.B.A. in Finance with an Investment Management Concentration I am interested in pursuing a career in data analytics/business analytics. I have experience coding in Python, R, and Java. As far as professional work experience, I have worked as a Research and Development Intern at Tenet3, LLC from June 2020-Present. Throughout this internship, I applied my knowledge from computer science and mathematics courses by exploring datasets, creating visualizations and dashboards, and implementing metrics from research papers in Python. Feel free to connect with me on LinkedIn! Column {data-width=150} ----------------------------------------------------------------------- ### Photo ```{r , echo=FALSE, fig.cap="Julia Weber", out.width = '100%'} knitr::include_graphics("H:/My Drive/UD Student Research/Capstone Weber/final-capstone/headshot.jpg") ```