Background and Data

Although by this point some of you may be avoiding the information overload surrounding Covid-19, understanding what is currently happening by getting up-to-date data directly from the source, and extracting one’s own conclusions can be empowering.

At this point of the term you’ve already an R toolbox broad enough to tackle the massive amount of data found in Johns Hopkins COVID-19 repository (click here). These data has been updated daily since the COVID-19 pandemic started. I also included links to other supplementary data sets to potentially explore the effectiveness of measures taken throughout the pandemic, including-mask use and vaccination. The datasets included are:

  1. covid.ts.cases: daily time series for the confirmed number of COVID-19 cases at the county level for the US (file description here).

  2. covid.ts.deaths: daily time series for the confirmed number of COVID-19 deaths at the county level for the US (file description here).

  3. covid.usa.daily: COVID-19 USA daily state reports with the number of confirmed cases between April 14th and May 7th (file description here).

  4. Vaccination data: state level COVID-19 daily vaccination numbers time series data from the Johns Hopkins University repository (file description here, )

  5. Mask use data: information on mask use in the NY Times repository (file description here).

  6. State policy data: data files (one file by state) about dates and description of policies going into/out of effect. To load data for a particular state go to this link, find the name of the state file you want to work with.

Here is the data:

# confirmed COVID-19 time series cases US (trimmed to include from 11/01/2020 to 05/12/2021)
covid.usa.ts.confirmed <- read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv') %>% select(c(1:11,296:488))

# Confirmed COVID-19 time series deaths US (trimmed to include from 11/01/2020 to 05/12/2021)
covid.usa.ts.deaths <- read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv') %>% select(c(1:11,296:488))

# Daily data summary by state for 05-12-2021
covid.usa.daily <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/05-12-2021.csv") 

# US vaccinated people data
vacc.people <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/us_data/time_series/people_vaccinated_us_timeline.csv")

# mask use data
maskuse <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/mask-use/mask-use-by-county.csv")

# Texas
policytrackerTX <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Texas_policy.csv")

# Florida
policytrackerFL <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Florida_policy.csv")

# Hawaii
policytrackerHI <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Hawaii_policy.csv")

# Maine
policytrackerME <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Maine_policy.csv")

# Montana
policytrackerMT <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Montana_policy.csv")

# California 
policytrackerCA <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/California_policy.csv")

Part 1: Wrangling the COVID-19 time series data

  1. Using the time series data sets covid.usa.ts.confirmed and covid.use.ts.deaths, which are both at the county level and are in wide format, reshape them into long format (using either of the functions gather or pivot_longer) to generate a single new data frame with the daily time series BY STATE including both number of confirmed cases and deaths, so that you have one row for each combination of state and date. Call this new data frame covid.usa.states.ts. Note: after reshaping your file into long format, your new date column (or however you decide to call it) needs to be converted into a Date-Time type variable. For example, if your variable is called my.date.variable, you can make this conversion using lubridate::mdy(my.date.variable).
## # A tibble: 5 x 4
## # Groups:   Province_State [1]
##   Dates      Deaths Province_State Confirmed_Cases
##   <date>      <dbl> <chr>                    <dbl>
## 1 2021-01-01   4872 Alabama                 365747
## 2 2021-01-10   5334 Alabama                 401900
## 3 2021-01-11   5347 Alabama                 404000
## 4 2021-01-12   5573 Alabama                 407848
## 5 2021-01-13   5760 Alabama                 410995
  1. Append to covid.usa.states.ts (created in previous problem) all of the information from matching rows in vacc.people (without repeating columns with the same info in the two data sets).
## # A tibble: 5 x 6
## # Groups:   Province_State [1]
##   Dates      Province_State Deaths Confirmed_Cases People_Fully_Vaccinated
##   <date>     <chr>           <dbl>           <dbl>                   <dbl>
## 1 2021-01-01 Alabama          4872          365747                      NA
## 2 2021-01-10 Alabama          5334          401900                      NA
## 3 2021-01-11 Alabama          5347          404000                      NA
## 4 2021-01-12 Alabama          5573          407848                      NA
## 5 2021-01-13 Alabama          5760          410995                      NA
## # … with 1 more variable: People_Partially_Vaccinated <dbl>

Note: People_Fully_Vaccinated and People_Partially_Vaccinated show as NA because of the dates displayed were before the vaccine was released.

Part 2: Let’s use the data

  1. Using covid.usa.daily select 3 highly impacted states, 3 mildly impacted states by COVID-19, where by ‘highly impacted’ I mean the states with high numbers of confirmed cases.

Highly Impacted States

Mildly Impacted States

  1. Create a visualization of the evolution of confirmed cases, deaths, and people vaccinated for each of the 6 states identified as highly and mildly impacted (use the covid.usa.states.ts data.frame to create the figure).

  1. Do you see any interesting change in the trajectories of the corresponding time series for the number of cases and deaths taking place with vaccinations? Produce meaningful summaries that enable you to quantify this change (e.g., average number of new cases in windows of 90 days before vs 90 days after vaccination started). One or two summary measures is good. Make either a table or a figure to display your findings and comment on them.
Hawaii
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine
90 Days Before Vaccine 90 Days After Vaccine
Average Confirmed Cases 21,738.58 30,698.38
Average Deaths 297.30 458.05
Maine
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine
90 Days Before Vaccine 90 Days After Vaccine
Average Confirmed Cases 14,705.47 42,874.47
Average Deaths 236.53 647.44
Montana
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine
90 Days Before Vaccine 90 Days After Vaccine
Average Confirmed Cases 67,038.40 100,255.80
Average Deaths 769.41 1,370.69
California
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine
90 Days Before Vaccine 90 Days After Vaccine
Average Confirmed Cases 2,121,173.24 3,639,255.15
Average Deaths 26,717.37 56,556.90
Florida
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine
90 Days Before Vaccine 90 Days After Vaccine
Average Confirmed Cases 1,049,231.80 1,818,799.32
Average Deaths 19,097.03 28,963.04
Texas
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine
90 Days Before Vaccine 90 Days After Vaccine
Average Confirmed Cases 1,371,918.26 2,515,123.77
Average Deaths 23,334.68 41,117.35
  1. Using information from the maskuse data explore if mask use seems to correlate with your selected states being midly or highly impacted.

  1. Formulate and explore ONE question about the 3 highly and 3 mildly affected states with any of the datasets I have provided, or by using the policy tracker datasets corresponding to your states.

How do open and closure days relate to number of Deaths, Confirmed Cases, and People Fully Vaccinated from 3 highly affected states (California, Florida, Texas), and 3 mildly affected states (Hawaii, Maine, Montana)?

Note:

Opening are shown with gold vertical lines

Closures are shown with red vertical lines